Wiring the Pipeline: DSDL Configuration, HEC, and Splunk Integration (Part 3 of 4)

Introduction

In Part 2, you built four HTTPS inference containers running on Jetson Nano hardware. They are healthy, serving the DSDL-native protocol, and waiting for requests. In this post, you will wire everything together: Splunk DSDL installed on the search head, HEC configured on the indexer cluster, and the exact configuration files that make DSDL’s fit and apply commands route correctly to your containers.

This is the most configuration-dense part of the series. It is also where most implementations break down – not because the concepts are complex, but because DSDL 5.2.3 has several undocumented behaviors that only become visible when you read its Python source code. This post documents those behaviors explicitly so you do not have to discover them through trial and error.

Prerequisites

Splunk Enterprise 10.0 with Enterprise Security installed
Python for Scientific Computing add-on (Splunk_SA_Scientific_Python_linux_x86_64) installed
AI Toolkit / MLTK (Splunk_ML_Toolkit 5.6.4 or later) installed and enabled
All four Nano inference containers running and verified from Part 2
A Splunk indexer cluster managed by a cluster manager node
Lab CA certificate at ~/lab-pki/lab-ca.crt on the search head

Step 1 – Verifying Prerequisites Before Installing DSDL

DSDL must be installed last. Installing it before MLTK or PSC causes dependency failures that require removing apps to fix. Verify both are present:

| rest /services/apps/local
| search title IN ("Splunk_ML_Toolkit", "Splunk_SA_Scientific_Python_linux_x86_64")
| table title, version, disabled

Both apps should appear with disabled blank (meaning enabled). Also verify MLTK permissions are set to Global – DSDL cannot find MLTK commands if they are scoped to a single app:

Navigate to Apps > Manage Apps
Find Machine Learning Toolkit and click Permissions
Confirm Object should appear in is set to All Apps
If you are running Splunk Enterprise Security, this is almost certainly already set correctly since ES requires it

Step 2 – Configuring HEC on the Indexer Cluster

In a Splunk indexer cluster, HEC configuration must be distributed via the cluster manager bundle. Do not configure HEC through the Splunk web UI on individual indexers – you will end up with inconsistent token configurations across peers.

Create the HEC configuration app on the cluster manager. The ta_jetson_hec app contains three files: inputs.conf for the HEC token, indexes.conf to create the target index, and props.conf for the jetson_inference sourcetype:

# On the cluster manager
mkdir -p $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/{default,local,metadata}

# inputs.conf -- HEC global settings and shared token
cat > $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/local/inputs.conf << 'EOF'
[http]
disabled=0
enableSSL=1
port=8088
dedicatedIoThreads=2
maxSockets=0
maxThreads=0
useDeploymentServer=0

[http://jetson-inference-nodes]
disabled=0
token=21a7297a-b7fb-4209-b719-72c4fb58f38d
index=ai_inference
indexes=ai_inference
sourcetype=jetson_inference
EOF

# indexes.conf -- create the target index on all peers
cat > $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/local/indexes.conf << 'EOF'
[ai_inference]
disabled=false
homePath   = $SPLUNK_DB/ai_inference/db
coldPath   = $SPLUNK_DB/ai_inference/colddb
thawedPath = $SPLUNK_DB/ai_inference/thaweddb
maxTotalDataSizeMB=10240
frozenTimePeriodInSecs=7776000
EOF

# app.conf
cat > $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/default/app.conf << 'EOF'
[launcher]
author=SecurityLab
version=1.0.0
description=HEC receiver for Jetson Nano DSDL inference nodes

[package]
id=ta_jetson_hec

[ui]
is_visible=false
label=TA Jetson HEC
EOF

Validate and push the bundle:

# Validate -- confirm no config errors before pushing
splunk validate cluster-bundle --check-restart

# Push to all peers
splunk apply cluster-bundle

Also copy the app to the search head so its props.conf definitions are available at search time. The search head will ignore the HEC [http] stanzas – only indexer peers act on those:

cp -r $SPLUNK_HOME/etc/master-apps/ta_jetson_hec \
      $SPLUNK_HOME/etc/apps/ta_jetson_hec

Verify HEC is working from each Nano. Run this from node 1 and replace INDEXER1_IP with your actual indexer IP:

curl -k https://INDEXER1_IP:8088/services/collector \
  -H 'Authorization: Splunk 21a7297a-b7fb-4209-b719-72c4fb58f38d' \
  -H 'Content-Type: application/json' \
  -d '{"sourcetype":"jetson_inference","index":"ai_inference",
       "host":"k8clstr01cm","event":{"test":"hec_ok"}}'

# Expected: {"text":"Success","code":0}

If you get No route to host, the host-based firewall on the indexer is blocking port 8088. On each indexer:

sudo iptables -I INPUT -p tcp --dport 8088 -j ACCEPT

Step 3 – Installing DSDL on the Search Head

Install DSDL from Splunkbase (app ID 4607). If the search head cannot reach Splunkbase directly, download it on a machine with internet access and install via CLI:

$SPLUNK_HOME/bin/splunk install app \
  /tmp/splunk-app-for-data-science-and-deep-learning_523.tgz \
  -auth admin:YOUR_PASSWORD

When prompted, restart Splunk. On a search head running Enterprise Security, restart takes 2-4 minutes including ES initialization. Do not interact with DSDL until ES Incident Review loads normally – attempting DSDL configuration while ES is still initializing causes false Test & Save failures.

Step 4 – Preparing the Lab CA for DSDL

This is the step most implementations miss. DSDL’s endpoint() method uses Python’s urllib library (not requests) with a dynamically constructed SSL context. The relevant code in MLTKContainer.py:

server_cert = ssl.get_server_certificate((url_parsed.hostname, url_parsed.port))
ssl_context = ssl.create_default_context(cadata=server_cert)

ssl.get_server_certificate() fetches only the leaf certificate – it does not fetch the CA certificate. When DSDL tries to verify the chain with ssl.create_default_context(cadata=server_cert), it has the Nano’s certificate but not the CA that signed it. This produces the error SSL CERTIFICATE_VERIFY_FAILED: self-signed certificate in certificate chain – the same error you get when you run curl without -k against a self-signed cert.

The fix is to tell DSDL to use the CA file directly instead of the dynamically fetched cert. DSDL already supports this through the endpoint_cert_filename_or_path key in docker.conf:

# In MLTKContainer.py endpoint_cert_filename_or_path handling:
if cert_file_or_path:
    ssl_context = ssl.create_default_context(cafile=endpoint_cert_filename_or_path)

Copy the lab CA to the DSDL app directory:

cp ~/lab-pki/lab-ca.crt \
   $SPLUNK_HOME/etc/apps/mltk-container/local/lab-ca.crt

Step 5 – Configuring DSDL via the Setup UI (Environment 1)

Navigate to DSDL app > Configuration > Setup. Fill in the Docker Settings for Environment 1:

Field	Value
Docker Host	`tcp://10.1.30.23:2375`
Endpoint URL	`https://10.1.30.23:8501`
External URL	`https://10.1.30.23:8501`

Leave the Splunk Docker Logging fields blank. The Jupyter Notebook settings are entirely optional for this use case – skip them. Click Test & Save. A green checkmark confirms Splunk can reach the container on both port 2375 (Docker API) and port 8501 (inference endpoint).

The Setup UI only configures Environment 1. Environment 2 is added directly to docker.conf.

Step 6 – Writing the Complete docker.conf

This is the most important configuration file. The Splunk config layering system merges default/docker.conf (which has blank values for all keys) with local/docker.conf. If your local file only has the keys you care about, blank values from default win for the rest. You must explicitly define every key that appears in default/docker.conf to ensure local values override default values.

Write the complete file:

cat > $SPLUNK_HOME/etc/apps/mltk-container/local/docker.conf << 'EOF'
[connection]
api_token = WSDG3A5R94Z0D822K6FWRO6AXZAQ0F36NCI6RELCMKGO7HRZ03HRME7B2BJ67MWH
api_workers = 1
container_enable_https = 1
container_enable_keepalive = 0
docker_url = tcp://10.1.30.23:2375
docker_network =
# This is the fix for DSDL's SSL verification with self-signed certs
# DSDL reads this and builds ssl.create_default_context(cafile=<path>)
# which correctly includes the CA chain rather than just the leaf cert
endpoint_cert_filename_or_path = /home/splunk/etc/apps/mltk-container/local/lab-ca.crt
endpoint_cert_check_hostname = 0
endpoint_hostname = https://10.1.30.23:8501
endpoint_hostname_external = https://10.1.30.23:8501
docker_logging_endpoint_hostname =
docker_logging_splunk_token =
image_pull_secrets = None
in_cluster_mode = false
is_configured_complete = 1
olly_enabled = 0
olly_splunk_access_token =
olly_otel_endpoint =
olly_otel_service_name =
splunk_access_enabled = 0
splunk_access_token =
splunk_access_host =
splunk_access_port =
splunk_hec_enabled = 0
splunk_hec_token =
splunk_hec_url =

[connection2]
api_token = WSDG3A5R94Z0D822K6FWRO6AXZAQ0F36NCI6RELCMKGO7HRZ03HRME7B2BJ67MWH
api_workers = 1
container_enable_https = 1
container_enable_keepalive = 0
docker_url = tcp://10.1.30.24:2375
docker_network =
endpoint_cert_filename_or_path = /home/splunk/etc/apps/mltk-container/local/lab-ca.crt
endpoint_cert_check_hostname = 0
endpoint_hostname = https://10.1.30.24:8501
endpoint_hostname_external = https://10.1.30.24:8501
docker_logging_endpoint_hostname =
docker_logging_splunk_token =
image_pull_secrets = None
in_cluster_mode = false
is_configured_complete = 1
olly_enabled = 0
olly_splunk_access_token =
olly_otel_endpoint =
olly_otel_service_name =
splunk_access_enabled = 0
splunk_access_token =
splunk_access_host =
splunk_access_port =
splunk_hec_enabled = 0
splunk_hec_token =
splunk_hec_url =
EOF

The api_token value is DSDL’s own internal authorization token – it is generated when you run the Setup UI and is used as an Authorization header in every request DSDL sends to your container. Your container app.py does not need to validate this token (the DSDL-native container examples do not either), but DSDL will not send requests without it.

Step 7 – Writing containers.conf

DSDL uses containers.conf to map model names to container endpoints. This file is separate from docker.conf and is not configured through the Setup UI. Without a [__dev__] stanza, every fit and apply call fails with no config found for model name -- switching to default __dev__ container followed by a blank endpoint error.

Add a stanza for every model you plan to train before running the fit command:

cat > $SPLUNK_HOME/etc/apps/mltk-container/local/containers.conf << 'EOF'
[default]

[__dev__]
api_url = https://10.1.30.23:8501
api_url_external = https://10.1.30.23:8501
environment = connection

[smoke_test_model]
api_url = https://10.1.30.23:8501
api_url_external = https://10.1.30.23:8501
environment = connection

[zeek_net_anomaly]
api_url = https://10.1.30.23:8501
api_url_external = https://10.1.30.23:8501
environment = connection

[dns_tunnel_anomaly]
api_url = https://10.1.30.24:8501
api_url_external = https://10.1.30.24:8501
environment = connection2

[auth_anomaly]
api_url = https://10.1.30.24:8501
api_url_external = https://10.1.30.24:8501
environment = connection2
EOF

Restart Splunk to pick up all configuration changes:

$SPLUNK_HOME/bin/splunk restart

Step 8 – End-to-End Smoke Tests

Run all five tests in order. Each validates a layer the next test depends on.

Test 1 – DSDL urllib SSL connection (simulates exactly what DSDL does):

$SPLUNK_HOME/bin/splunk cmd python3 << 'PYEOF'
import urllib.request as urllib_request, ssl, json

url      = 'https://10.1.30.23:8501/fit'
api_token = 'WSDG3A5R94Z0D822K6FWRO6AXZAQ0F36NCI6RELCMKGO7HRZ03HRME7B2BJ67MWH'
ca_file  = '/home/splunk/etc/apps/mltk-container/local/lab-ca.crt'

ssl_ctx = ssl.create_default_context(cafile=ca_file)
ssl_ctx.check_hostname = False

payload = {"data": "bytes_in,bytes_out\n100,200\n300,400\n",
           "meta": {"options": {}, "feature_variables": ["bytes_in","bytes_out"]}}
data_encoded = str.encode(json.dumps(payload))
header = {'Authorization': api_token, 'Content-Type': 'application/json'}
req = urllib_request.Request(url, data_encoded, header)

try:
    response = urllib_request.urlopen(req, context=ssl_ctx)
    parsed = json.loads(response.read())
    print('SUCCESS -- status:', parsed.get('status'))
except Exception as e:
    print('FAILED:', str(e))
PYEOF

Expected: SUCCESS -- status: success

Test 2 – Splunk fit command:

| makeresults count=50
| eval bytes_in=random()%10000, bytes_out=random()%6000,
       duration_sec=round(random()%10+0.1,2),
       packet_count=random()%100, unique_ports=random()%8,
       failed_logins=random()%5
| fit MLTKContainer algo=isolation_forest
    bytes_in, bytes_out, duration_sec, packet_count, unique_ports, failed_logins
    into app:smoke_test_model
    environment=1
| table bytes_in, bytes_out, anomaly_score, is_anomaly, anomaly_label

Expected: 50 rows with anomaly_score, is_anomaly, and anomaly_label populated.

Test 3 – Splunk apply command:

| makeresults count=10
| eval bytes_in=random()%10000, bytes_out=random()%6000,
       duration_sec=round(random()%10+0.1,2),
       packet_count=random()%100, unique_ports=random()%8,
       failed_logins=random()%5
| apply smoke_test_model
| table bytes_in, bytes_out, anomaly_score, is_anomaly, anomaly_label

Important: the correct apply syntax is | apply model_name – the model name is the first positional argument. | apply MLTKContainer algo=... is not how MLTK’s apply command works. MLTK infers the algorithm from the saved model registry.

Test 4 – HEC events in Splunk:

index=ai_inference sourcetype=jetson_inference earliest=-15m
| table _time, host, anomaly_score, is_anomaly, anomaly_label
| sort -anomaly_score

Expected: rows with host=k8clstr01cm and anomaly scores populated. These events were pushed back to Splunk by the container’s background HEC thread when anomalies were detected during the apply call.

Conclusion

The pipeline is fully wired. Splunk can train models on the Nano containers, apply those models to live data, and receive anomalous events back via HEC – all without leaving the Splunk SPL interface.

The three DSDL behaviors that are not in the documentation but that you now understand from source code: the CSV wire protocol, the urllib SSL context behavior requiring endpoint_cert_filename_or_path, and the containers.conf model registry that must be pre-populated before fit runs. These three things account for the majority of DSDL deployment failures on non-standard hardware.

In Part 4, you will train real models on your actual security data – Zeek connection logs, Splunk Stream DNS, and Windows Security events – and build ES correlation rules that generate actionable notable events from AI-scored anomalies.

Part 1: Architecture and Concepts Part 2: Building the DSDL-Native Inference Container on Jetson Nano Part 4: Real Security Data – Training and Deploying Anomaly Detection Models

Introduction#

Prerequisites#

Step 1 – Verifying Prerequisites Before Installing DSDL#

Step 2 – Configuring HEC on the Indexer Cluster#

Step 3 – Installing DSDL on the Search Head#

Step 4 – Preparing the Lab CA for DSDL#

Step 5 – Configuring DSDL via the Setup UI (Environment 1)#

Step 6 – Writing the Complete docker.conf#

Step 7 – Writing containers.conf#

Step 8 – End-to-End Smoke Tests#

Conclusion#