Building the DSDL-Native Inference Container on Jetson Nano (Part 2 of 4)

Introduction

In Part 1, you learned why an edge AI inference pipeline is a compelling approach for security operations, and you understood the key constraints imposed by the Jetson Nano 4GB hardware and DSDL 5.2.3’s actual behavior. In this post, you will build the inference container – every file, every command, and every decision explained.

The build covers four areas: the Dockerfile and its critical dependencies, the DSDL-native Flask application that implements the correct wire protocol, TLS certificate generation using a lab CA, and the distribution workflow for deploying the finished image to all four nodes.

By the end of this post, all four Nano nodes will be running HTTPS inference containers with loaded models, verified from the Splunk search head.

Prerequisites

Four Jetson Nano 4GB Developer Kit nodes with JetPack 4.6.6 installed and Docker running
SSH access to all four nodes from a central admin machine
OpenSSL available on the machine where you will generate certificates
10 GB free disk space on node 1 for the build process and image export

Step 1 – Understanding the Base Image Choice

The most important decision in the Dockerfile is the base image. Splunk’s published DSDL container images are built for x86_64 (Intel/AMD) processors. The Jetson Nano is ARM64 (aarch64). Docker will refuse to run an x86_64 image on ARM64 with an exec format error. You cannot work around this without building your own image.

NVIDIA publishes official ARM64 container images for JetPack through their NGC registry. The image nvcr.io/nvidia/l4t-ml:r32.7.1-py3 is the right choice for JetPack 4.6.6 (L4T R32.7.x). This image already contains Python 3.6.9, scikit-learn 0.23.2, numpy 1.19.5, pandas 1.1.5, and scipy – everything the inference server needs except Flask. By starting here instead of a generic Ubuntu base, you eliminate the need to compile ARM64 Python wheels, which is the most common failure point when building ML containers for Jetson hardware.

The r32.7.1 tag is compatible with r32.7.6 on your nodes. Minor point versions within R32.7.x share the same ABI.

Step 2 – Writing the Dockerfile

Create the project directory on node 1 and write the Dockerfile:

mkdir -p ~/dsdl-nano
cat > ~/dsdl-nano/Dockerfile << 'EOF'
FROM nvcr.io/nvidia/l4t-ml:r32.7.1-py3

# Flask 1.x required -- Flask 2+ dropped Python 3.6 support
# Werkzeug 1.x pinned to match Flask 1.x's API expectations
RUN pip3 install --no-cache-dir \
    flask==1.1.4 \
    Werkzeug==1.0.1

WORKDIR /app

# CRITICAL: create these directories explicitly before any volume mounts
# Without this, Docker creates directory placeholders instead of file mounts
# causing IsADirectoryError when Flask tries to load the certificate
RUN mkdir -p /app/certs /app/model

COPY app.py .

EXPOSE 8501

# Use HTTPS in healthcheck since container serves TLS when certs are mounted
# The -k flag bypasses cert verification -- container lacks the lab CA
HEALTHCHECK --interval=15s --timeout=5s --start-period=30s --retries=3 \
    CMD curl -sfk https://localhost:8501/ || exit 1

CMD ["python3", "app.py"]
EOF

Two Dockerfile decisions deserve explanation. Flask is pinned to 1.1.4 and Werkzeug to 1.0.1 because Flask 2.0 dropped support for Python 3.6, and Werkzeug 2.0 introduced breaking API changes that Flask 1.x cannot handle. Both pins are hard requirements for this hardware.

The RUN mkdir -p /app/certs /app/model line is not optional. When Docker mounts a file into a container using a volume bind (-v host_file:/container/path), if the parent directory does not exist in the image, Docker creates a directory at that path instead of a file mount. The result is that ssl_ctx.load_cert_chain(certfile="/app/certs/chain.pem") fails with IsADirectoryError: [Errno 21] Is a directory because /app/certs/chain.pem is a directory, not a file. Creating the directories explicitly in the image prevents this.

Step 3 – Writing the DSDL-Native Inference Server

The Flask application is the most critical piece of this build. Most DSDL container examples online implement a custom JSON protocol with columns and data arrays. That is wrong for DSDL 5.2.x. Reading DSDL’s source code at $SPLUNK_HOME/etc/apps/mltk-container/bin/mltkc/MLTKContainer.py reveals the actual protocol:

DSDL sends data as a CSV string inside a JSON wrapper:

{
  "data": "bytes_in,bytes_out\n100,200\n300,400\n",
  "meta": {
    "options": {"algo": "isolation_forest"},
    "feature_variables": ["bytes_in", "bytes_out"]
  }
}

DSDL expects results back as a CSV string in a results key:

{
  "status": "success",
  "message": "...",
  "results": "anomaly_score,is_anomaly,anomaly_label\n0.23,0,normal\n0.91,1,ANOMALY\n"
}

DSDL then reads the results with pd.read_csv(StringIO(result["results"])) and merges them with the original dataframe. Any other response format causes a silent JSON parse failure that appears in Splunk as unable to read JSON response.

Create the application file:

cat > ~/dsdl-nano/app.py << 'APPEOF'
import json, os, sys, logging, traceback
from io import StringIO

import numpy as np
import pandas as pd
import joblib
from flask import Flask, request, Response

logging.basicConfig(level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s", stream=sys.stdout)
log = logging.getLogger(__name__)

app = Flask(__name__)
MODEL_PATH = "/app/model/model.pkl"
META_PATH  = "/app/model/model_meta.json"
model      = None
model_meta = {}

SPLUNK_META = {"_time","_raw","host","source","sourcetype","index",
               "splunk_server","_bkt","_cd","_si"}

def load_model():
    global model, model_meta
    if os.path.exists(MODEL_PATH):
        model = joblib.load(MODEL_PATH)
        if os.path.exists(META_PATH):
            with open(META_PATH) as f:
                model_meta = json.load(f)
        log.info("Model loaded. Features: %s", model_meta.get("features"))

def parse_payload(payload):
    data_csv = payload.get("data", "")
    meta     = payload.get("meta", {})
    options  = meta.get("options", {})
    fvars    = meta.get("feature_variables", [])
    df = pd.read_csv(StringIO(data_csv))
    return df, options, fvars

def get_features(df, fvars=None):
    candidates = fvars if fvars else [
        c for c in df.columns if c not in SPLUNK_META
        and c not in ("anomaly_score","is_anomaly","anomaly_label")]
    return [c for c in candidates if c in df.columns and
            pd.to_numeric(df[c], errors="coerce").notna().any()]

def dsdl_ok(results_df, message=""):
    return Response(json.dumps({
        "status": "success", "message": message,
        "results": results_df.to_csv(index=False)
    }), status=200, mimetype="application/json")

def dsdl_err(msg, code=500):
    return Response(json.dumps({
        "status": "error", "message": msg, "results": ""
    }), status=code, mimetype="application/json")

@app.route("/")
def health():
    return {"status":"ok","model":"isolation_forest",
            "model_loaded": model is not None,
            "features": model_meta.get("features",[]),
            "l4t":"r32.7.6","python":sys.version.split()[0],
            "protocol":"dsdl-native"}

@app.route("/fit", methods=["POST"])
def fit():
    global model, model_meta
    try:
        payload      = request.get_json(force=True)
        df, opts, fv = parse_payload(payload)
        features     = get_features(df, fv)
        if not features:
            return dsdl_err("No numeric feature columns found", 400)
        X = df[features].fillna(0).astype(float).values
        log.info("/fit -- %d samples, features: %s", len(X), features)

        from sklearn.ensemble import IsolationForest
        m = IsolationForest(contamination=float(opts.get("contamination",0.05)),
                            n_estimators=int(opts.get("n_estimators",100)),
                            random_state=42, n_jobs=-1)
        m.fit(X)
        os.makedirs("/app/model", exist_ok=True)
        joblib.dump(m, MODEL_PATH)
        model_meta = {"features": features, "n_samples": len(X)}
        with open(META_PATH, "w") as f:
            json.dump(model_meta, f)
        model = m

        raw   = m.score_samples(X)
        preds = m.predict(X)
        norm  = (raw - raw.min()) / (raw.max() - raw.min() + 1e-9)
        scores = np.round(1.0 - norm, 4)
        results = pd.DataFrame({
            "anomaly_score": scores,
            "is_anomaly":    (preds==-1).astype(int),
            "anomaly_label": ["ANOMALY" if p==-1 else "normal" for p in preds]
        })
        return dsdl_ok(results,
            "Isolation Forest trained on %d samples" % len(X))
    except Exception as e:
        log.error("/fit error: %s", traceback.format_exc())
        return dsdl_err(str(e))

@app.route("/apply", methods=["POST"])
def apply():
    if model is None:
        return dsdl_err("No model loaded. Run fit first.", 500)
    try:
        payload      = request.get_json(force=True)
        df, opts, fv = parse_payload(payload)
        features     = get_features(df, model_meta.get("features") or fv)
        if not features:
            return dsdl_err("No numeric feature columns found", 400)
        X = df[features].fillna(0).astype(float).values
        log.info("/apply -- %d samples", len(X))

        raw   = model.score_samples(X)
        preds = model.predict(X)
        norm  = (raw - raw.min()) / (raw.max() - raw.min() + 1e-9)
        scores = np.round(1.0 - norm, 4)
        results = pd.DataFrame({
            "anomaly_score": scores,
            "is_anomaly":    (preds==-1).astype(int),
            "anomaly_label": ["ANOMALY" if p==-1 else "normal" for p in preds]
        })

        anomalies = df.copy()
        anomalies["anomaly_score"] = scores
        anomalies["is_anomaly"]    = (preds==-1).astype(int)
        rows = anomalies[anomalies["is_anomaly"]==1].to_dict(orient="records")
        if rows:
            import threading
            threading.Thread(target=send_hec, args=(rows,), daemon=True).start()

        return dsdl_ok(results,
            "%d anomalies in %d samples" % (int((preds==-1).sum()), len(X)))
    except Exception as e:
        log.error("/apply error: %s", traceback.format_exc())
        return dsdl_err(str(e))

def send_hec(events):
    import requests as r, random
    urls   = [u.strip() for u in
              os.environ.get("SPLUNK_HEC_URLS","").split(",") if u.strip()]
    token  = os.environ.get("SPLUNK_HEC_TOKEN","")
    index  = os.environ.get("SPLUNK_HEC_INDEX","ai_inference")
    host   = os.environ.get("SPLUNK_HEC_HOST","jetson-nano")
    verify = os.environ.get("SPLUNK_HEC_SSL_VERIFY","false").lower() != "false"
    if not urls or not token:
        return
    payload = "".join(json.dumps({
        "sourcetype":"jetson_inference","index":index,
        "host":host,"event":e}) for e in events)
    try:
        r.post(random.choice(urls),
               headers={"Authorization":"Splunk "+token},
               data=payload, verify=verify, timeout=5)
    except Exception as ex:
        log.warning("HEC push failed: %s", ex)

if __name__ == "__main__":
    import ssl as ssl_lib
    load_model()
    log.info("DSDL inference server starting on port 8501")
    CERT = "/app/certs/chain.pem"
    KEY  = "/app/certs/node.key"
    if os.path.exists(CERT) and os.path.exists(KEY):
        ctx = ssl_lib.SSLContext(ssl_lib.PROTOCOL_TLS_SERVER)
        ctx.load_cert_chain(certfile=CERT, keyfile=KEY)
        log.info("TLS enabled -- serving HTTPS on port 8501")
        app.run(host="0.0.0.0", port=8501,
                ssl_context=ctx, debug=False, threaded=True)
    else:
        log.warning("No certs found -- falling back to HTTP")
        app.run(host="0.0.0.0", port=8501, debug=False, threaded=True)
APPEOF

Note that all unicode characters are intentionally avoided in this file. Python 3.6 inside a Docker container with no locale configured defaults to ASCII stdout encoding. Any character above U+007F in a print statement or log message will raise UnicodeEncodeError at startup and crash the container. Use plain ASCII throughout.

Step 4 – Generating TLS Certificates

DSDL requires HTTPS for the container endpoint. You will generate a lab CA and sign a certificate for each node. Run these commands on the machine where you will manage certificates (the search head or a dedicated admin machine). The commands use only standard OpenSSL available on any Linux system.

mkdir -p ~/lab-pki && cd ~/lab-pki

# Generate lab CA -- 4096-bit RSA, 10-year validity for lab use
openssl genrsa -out lab-ca.key 4096
openssl req -new -x509 -days 3650 -key lab-ca.key -out lab-ca.crt \
  -subj '/C=US/ST=Lab/L=Lab/O=SecurityLab/OU=InferenceLab/CN=SecurityLab-RootCA'

Generate a certificate for each node. The Subject Alternative Name (SAN) block is required – modern TLS rejects certificates that only have the hostname in CN. You must include both the hostname and IP in the SAN or DSDL’s SSL verification will fail:

# Repeat for each node -- change NODE_NAME and NODE_IP
NODE_NAME='k8clstr01cm'
NODE_IP='10.1.30.23'

openssl genrsa -out ${NODE_NAME}.key 2048

cat > ${NODE_NAME}-csr.cnf << EOF
[req]
default_bits = 2048
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no
[req_distinguished_name]
CN = ${NODE_NAME}.test.lab
[req_ext]
subjectAltName = @alt_names
basicConstraints = CA:FALSE
keyUsage = digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
[alt_names]
DNS.1 = ${NODE_NAME}.test.lab
DNS.2 = ${NODE_NAME}
IP.1  = ${NODE_IP}
EOF

openssl req -new -key ${NODE_NAME}.key -out ${NODE_NAME}.csr \
  -config ${NODE_NAME}-csr.cnf

openssl x509 -req -days 730 -in ${NODE_NAME}.csr \
  -CA lab-ca.crt -CAkey lab-ca.key -CAcreateserial \
  -out ${NODE_NAME}.crt \
  -extensions req_ext -extfile ${NODE_NAME}-csr.cnf

# Full chain: node cert + CA cert -- Flask needs both in one file
cat ${NODE_NAME}.crt lab-ca.crt > ${NODE_NAME}-chain.pem

# Verify
openssl verify -CAfile lab-ca.crt ${NODE_NAME}.crt

Distribute the certificates to node 1:

ssh [email protected] 'mkdir -p ~/dsdl-nano/certs && chmod 700 ~/dsdl-nano/certs'
scp ~/lab-pki/k8clstr01cm.key       [email protected]:~/dsdl-nano/certs/
scp ~/lab-pki/k8clstr01cm-chain.pem [email protected]:~/dsdl-nano/certs/

Step 5 – Building the Docker Image

Build on node 1. The --no-cache flag ensures Docker pulls fresh layers and does not reuse a stale cached layer with incorrect directory structure:

# On k8clstr01cm (10.1.30.23)
cd ~/dsdl-nano
docker build --no-cache -t dsdl-nano:1.0 .

The build will take 3-5 minutes. Most of the time is Flask installation – the l4t-ml base image already has everything else.

Verify the image before starting the container:

docker images | grep dsdl-nano
# Should show: dsdl-nano   1.0   <id>   <time>   ~2.1GB

Step 6 – Starting the Inference Container

Start the container on node 1 with all required environment variables and volume mounts. Replace INDEXER1_IP and INDEXER2_IP with your actual Splunk indexer IPs:

docker run -d \
  --name dsdl-inference \
  --runtime=nvidia \
  --restart unless-stopped \
  -p 8501:8501 \
  -e SPLUNK_HEC_URLS='https://INDEXER1_IP:8088/services/collector,https://INDEXER2_IP:8088/services/collector' \
  -e SPLUNK_HEC_TOKEN='your-hec-token-guid' \
  -e SPLUNK_HEC_INDEX='ai_inference' \
  -e SPLUNK_HEC_HOST='k8clstr01cm' \
  -e SPLUNK_HEC_SSL_VERIFY='false' \
  -v ~/dsdl-nano/model:/app/model \
  -v ~/dsdl-nano/certs/k8clstr01cm-chain.pem:/app/certs/chain.pem:ro \
  -v ~/dsdl-nano/certs/k8clstr01cm.key:/app/certs/node.key:ro \
  dsdl-nano:1.0

The SPLUNK_HEC_SSL_VERIFY=false setting is correct for the Nanos. They do not have Splunk’s internal certificate authority in their trust stores, so they cannot verify the indexers’ TLS certificates. This is acceptable for a lab environment on an isolated network.

Wait 10 seconds for startup, then verify:

sleep 10
docker logs dsdl-inference | grep -E "TLS|HTTPS|warn|Error"
# Expected: TLS enabled -- serving HTTPS on port 8501

Step 7 – Distributing to Remaining Nodes

Build once, distribute to all. Export the finished image from node 1 and copy to the other three nodes:

# On node 1 -- export the image
docker save dsdl-nano:1.0 | gzip > /tmp/dsdl-nano.tar.gz

# Copy to remaining nodes
for NODE_IP in 10.1.30.24 10.1.30.25 10.1.30.26; do
    echo "=== Copying to ${NODE_IP} ==="
    scp /tmp/dsdl-nano.tar.gz k8admin@${NODE_IP}:/tmp/
done

On each remaining node, load the image and start the container with the node-specific values:

# On k8clstr01wk01 (10.1.30.24) -- example
docker load < /tmp/dsdl-nano.tar.gz
docker run -d \
  --name dsdl-inference \
  --runtime=nvidia \
  --restart unless-stopped \
  -p 8501:8501 \
  -e SPLUNK_HEC_URLS='https://INDEXER1_IP:8088/services/collector,https://INDEXER2_IP:8088/services/collector' \
  -e SPLUNK_HEC_TOKEN='your-hec-token-guid' \
  -e SPLUNK_HEC_INDEX='ai_inference' \
  -e SPLUNK_HEC_HOST='k8clstr01wk01' \
  -e SPLUNK_HEC_SSL_VERIFY='false' \
  -v ~/dsdl-nano/model:/app/model \
  -v ~/dsdl-nano/certs/k8clstr01wk01-chain.pem:/app/certs/chain.pem:ro \
  -v ~/dsdl-nano/certs/k8clstr01wk01.key:/app/certs/node.key:ro \
  dsdl-nano:1.0

Only two values change per node: SPLUNK_HEC_HOST and the cert filenames in the volume mounts.

Step 8 – Verifying All Four Nodes

Run this from your admin machine to verify all four containers are healthy and serving HTTPS:

for NODE_IP in 10.1.30.23 10.1.30.24 10.1.30.25 10.1.30.26; do
    echo -n "${NODE_IP}: "
    curl -s --cacert ~/lab-pki/lab-ca.crt https://${NODE_IP}:8501/ \
        | python3 -c "import sys,json; d=json.load(sys.stdin);
          print('status=%s model_loaded=%s protocol=%s' % (
          d['status'], d['model_loaded'], d['protocol']))" \
        2>/dev/null || echo "FAILED"
done

All four should return status=ok model_loaded=True protocol=dsdl-native. If any node returns FAILED, check docker logs dsdl-inference on that node for the specific error.

Conclusion

You now have four HTTPS inference containers running on Jetson Nano hardware, each implementing DSDL’s native CSV wire protocol and ready to receive fit and apply requests from Splunk. The key decisions that make this work are: the l4t-ml base image eliminates ARM64 compilation problems, the explicit mkdir in the Dockerfile prevents Docker’s directory-placeholder bug, Flask 1.x handles Python 3.6, the DSDL-native CSV protocol matches what DSDL actually sends rather than what examples show, and the chain PEM file includes the full certificate chain for Flask’s TLS context.

In Part 3, you will wire this container infrastructure to Splunk – HEC configuration on the indexer cluster, DSDL setup on the search head, the critical docker.conf and containers.conf configuration that DSDL actually uses, and the end-to-end smoke tests that confirm the full pipeline works.

Part 1: Architecture and Concepts Part 3: Wiring the Pipeline – DSDL Configuration, HEC, and Splunk Integration Part 4: Real Security Data – Training and Deploying Anomaly Detection Models

Introduction#

Prerequisites#

Step 1 – Understanding the Base Image Choice#

Step 2 – Writing the Dockerfile#

Step 3 – Writing the DSDL-Native Inference Server#

Step 4 – Generating TLS Certificates#

Step 5 – Building the Docker Image#

Step 6 – Starting the Inference Container#

Step 7 – Distributing to Remaining Nodes#

Step 8 – Verifying All Four Nodes#

Conclusion#