[{"content":"Most security teams have the same problem: data volume is growing faster than analyst capacity, and signature-based detection alone is not catching the sophisticated, low-and-slow attacks that matter most. Machine learning promises to help, but production ML in a SOC is genuinely hard to operationalize. The gap between a data science notebook and a running inference pipeline that feeds your SIEM is wider than most blog posts acknowledge.\nThis series documents the end-to-end build of a working edge AI inference pipeline in a real security lab \u0026ndash; not a cloud demo, not a toy dataset. The hardware is four NVIDIA Jetson Nano 4GB Developer Kit nodes. The SIEM is Splunk Enterprise 10.0 with Enterprise Security. The ML toolkit is Splunk\u0026rsquo;s own Deep Learning Toolkit (DSDL). The detection targets are real security data sources: Zeek connection logs, Splunk Stream DNS telemetry, and Windows Security event logs.\nBy the end of this four-part series, you will have a working blueprint for:\nRunning GPU-accelerated inference containers on constrained edge hardware Wiring those containers to Splunk DSDL\u0026rsquo;s native protocol Training Isolation Forest models on real security data Generating scored anomaly events that drive ES correlation rules and notable events This first post covers the architecture and conceptual foundation. If you are a security architect or senior security engineer who has been curious about operationalizing ML in your SOC without a massive cloud spend, this series is written for you.\nPrerequisites Before following along, you should have:\nExperience with Splunk ES administration and SPL Familiarity with Docker and Linux system administration A working Splunk Enterprise 10.0 instance with ES installed Basic understanding of supervised and unsupervised machine learning concepts Access to Zeek, DNS, or Windows event log data in Splunk Step 1 \u0026ndash; Understanding the Problem Space The fundamental challenge with ML in a SOC is not the algorithm \u0026ndash; it is the operational pipeline. You need data to flow from your SIEM to the model, predictions to flow back, and the results to be actionable. Most ML-for-security projects fail not because the model is wrong but because the pipeline around it is brittle.\nTraditional SIEM-to-ML pipelines have three common failure modes. First, they require a data science team to maintain a separate ML platform that security operations has no visibility into. Second, they use batch processing that introduces latency between detection and response. Third, they generate predictions that have no clear path into the analyst workflow.\nThe approach in this series addresses all three. DSDL runs inference containers close to your Splunk deployment, predictions feed back into ES as scored events that drive correlation rules, and the entire pipeline is managed through SPL that your Splunk team already understands.\nStep 2 \u0026ndash; Understanding Edge Inference An inference service is a long-running web server that has a trained model loaded in memory and answers HTTP requests with predictions. When Splunk sends data to it, the model runs on that data and returns anomaly scores or classification labels. The server stays running between requests \u0026ndash; the model is loaded once at startup, not reloaded for each request.\nThis is conceptually different from batch ML where you export data, run a script, and import results. Inference services are always on and respond in near-real-time, which is what makes them useful in a detection pipeline.\nThe edge aspect means running these inference services on hardware that lives in your environment rather than a cloud provider. For a security lab, this has several advantages. Data stays on your network. You maintain full control over model versions. There is no egress cost. And for air-gapped or restricted environments, it is often the only viable approach.\nStep 3 \u0026ndash; Understanding the Jetson Nano Constraints The NVIDIA Jetson Nano 4GB Developer Kit is a capable edge AI platform but it has hard constraints that shape every architectural decision in this series.\nThe hardware runs JetPack 4.6.6, which is the ceiling for this device. JetPack 4.6.6 provides Ubuntu 18.04, Python 3.6.9 natively, and CUDA 10.2. These are not soft constraints \u0026ndash; you cannot upgrade JetPack on the original Nano to a newer version.\nThe practical implications are significant. Kubernetes and K3s will not run on this hardware \u0026ndash; the kernel is too old. Flask 2.x will not install \u0026ndash; it dropped Python 3.6 support. Large language models and transformer architectures will not fit in 4 GB of unified RAM. The right models for this hardware are small, tabular, scikit-learn-based algorithms. Isolation Forest is the perfect match: it runs entirely on CPU, has a tiny RAM footprint around 50 MB, and is exactly the kind of unsupervised anomaly detector that security data calls for.\nWorking within these constraints rather than fighting them is what makes the build in this series reliable and repeatable.\nStep 4 \u0026ndash; Understanding How DSDL Works Splunk\u0026rsquo;s Deep Learning Toolkit (DSDL) is a Splunk app that bridges SPL and Docker containers running inference servers. When you write | fit MLTKContainer algo=isolation_forest in a Splunk search, DSDL serializes your search results as a CSV string, POSTs them to an endpoint on your inference container over HTTPS, and the container returns predictions as another CSV string that DSDL merges back into your Splunk results.\nThree things about DSDL\u0026rsquo;s actual behavior differ from what the documentation implies:\nFirst, DSDL does not send JSON arrays. It sends data as a CSV string inside a JSON wrapper: {\u0026quot;data\u0026quot;: \u0026quot;\u0026lt;csv string\u0026gt;\u0026quot;, \u0026quot;meta\u0026quot;: {\u0026quot;options\u0026quot;: {...}, \u0026quot;feature_variables\u0026quot;: [...]}}. Custom containers must parse this format exactly or the pipeline silently fails.\nSecond, DSDL uses Python\u0026rsquo;s urllib library with a dynamically constructed SSL context. When using self-signed certificates, DSDL fetches only the leaf certificate from the server and builds an SSL context from it \u0026ndash; which cannot verify the certificate chain against your CA. The fix is to specify the CA certificate path in DSDL\u0026rsquo;s docker.conf configuration using the endpoint_cert_filename_or_path key.\nThird, DSDL uses a containers.conf file to map model names to container endpoints. Without a [__dev__] stanza in this file, every fit and apply call fails with a blank endpoint error. This is not documented clearly anywhere in Splunk\u0026rsquo;s official documentation.\nUnderstanding these three behaviors upfront saves hours of debugging.\nStep 5 \u0026ndash; Designing the Architecture The as-built architecture for this series has four layers.\nThe hardware layer consists of four Jetson Nano nodes running JetPack 4.6.6 with Docker. Each node runs one inference container and exposes two ports: 8501 for HTTPS inference requests and 2375 for the Docker TCP API that DSDL uses to manage container lifecycle.\nThe inference layer is a custom Python Flask application implementing DSDL\u0026rsquo;s native endpoint protocol. It loads a scikit-learn Isolation Forest model at startup and exposes two routes: /fit for training and /apply for inference. The application is packaged as a Docker image built from NVIDIA\u0026rsquo;s official nvcr.io/nvidia/l4t-ml:r32.7.1-py3 base image, which is pre-loaded with scikit-learn, numpy, and pandas.\nThe integration layer is DSDL configured on the Splunk search head with two environments: Environment 1 points to node 1 for Zeek network anomaly detection, and Environment 2 points to node 2 for DNS tunneling detection.\nThe detection layer consists of Splunk ES correlation rules that consume the anomaly_score, is_anomaly, and anomaly_label fields returned by the inference containers and generate notable events when scores exceed defined thresholds.\nData flows in two directions. From Splunk to the Nanos: the search head POSTs CSV data via HTTPS to port 8501. From the Nanos to Splunk: anomalous events push back to Splunk\u0026rsquo;s HTTP Event Collector on port 8088 on the indexer cluster.\nConclusion This post established the conceptual foundation for everything that follows. The key takeaways are:\nInference services are always-on web servers that load models at startup and answer HTTP requests with predictions. Edge inference means running these services on local hardware, keeping data in your environment and eliminating cloud dependencies.\nThe Jetson Nano 4GB is capable edge AI hardware with real constraints. Working within JetPack 4.6.6\u0026rsquo;s limits requires specific software choices but the hardware is well-matched to tabular security data anomaly detection.\nDSDL is more opinionated than it appears. Its CSV wire format, urllib SSL behavior, and containers.conf dependency are all undocumented constraints that require reading the source code to work around correctly.\nIn Part 2, you will build the Jetson Nano inference container from scratch \u0026ndash; Dockerfile, the DSDL-native Flask application, TLS certificate generation, and the deployment workflow for all four nodes.\nThis post is Part 1 of the Edge AI for SecOps series. Part 2: Building the DSDL-Native Inference Container Part 3: Wiring the Pipeline Part 4: Real Security Data\n","permalink":"https://telemetry-forge.t-security.org/posts/edge-ai-secops-part1/","summary":"\u003cp\u003eMost security teams have the same problem: data volume is growing faster than analyst capacity, and signature-based detection alone is not catching the sophisticated, low-and-slow attacks that matter most. Machine learning promises to help, but production ML in a SOC is genuinely hard to operationalize. The gap between a data science notebook and a running inference pipeline that feeds your SIEM is wider than most blog posts acknowledge.\u003c/p\u003e\n\u003cp\u003eThis series documents the end-to-end build of a working edge AI inference pipeline in a real security lab \u0026ndash; not a cloud demo, not a toy dataset. The hardware is four NVIDIA Jetson Nano 4GB Developer Kit nodes. The SIEM is Splunk Enterprise 10.0 with Enterprise Security. The ML toolkit is Splunk\u0026rsquo;s own Deep Learning Toolkit (DSDL). The detection targets are real security data sources: Zeek connection logs, Splunk Stream DNS telemetry, and Windows Security event logs.\u003c/p\u003e","title":"Building an Edge AI Inference Pipeline for Security Operations: Architecture and Concepts (Part 1 of 4) "},{"content":"Introduction In Part 1, you learned why an edge AI inference pipeline is a compelling approach for security operations, and you understood the key constraints imposed by the Jetson Nano 4GB hardware and DSDL 5.2.3\u0026rsquo;s actual behavior. In this post, you will build the inference container \u0026ndash; every file, every command, and every decision explained.\nThe build covers four areas: the Dockerfile and its critical dependencies, the DSDL-native Flask application that implements the correct wire protocol, TLS certificate generation using a lab CA, and the distribution workflow for deploying the finished image to all four nodes.\nBy the end of this post, all four Nano nodes will be running HTTPS inference containers with loaded models, verified from the Splunk search head.\nPrerequisites Four Jetson Nano 4GB Developer Kit nodes with JetPack 4.6.6 installed and Docker running SSH access to all four nodes from a central admin machine OpenSSL available on the machine where you will generate certificates 10 GB free disk space on node 1 for the build process and image export Step 1 \u0026ndash; Understanding the Base Image Choice The most important decision in the Dockerfile is the base image. Splunk\u0026rsquo;s published DSDL container images are built for x86_64 (Intel/AMD) processors. The Jetson Nano is ARM64 (aarch64). Docker will refuse to run an x86_64 image on ARM64 with an exec format error. You cannot work around this without building your own image.\nNVIDIA publishes official ARM64 container images for JetPack through their NGC registry. The image nvcr.io/nvidia/l4t-ml:r32.7.1-py3 is the right choice for JetPack 4.6.6 (L4T R32.7.x). This image already contains Python 3.6.9, scikit-learn 0.23.2, numpy 1.19.5, pandas 1.1.5, and scipy \u0026ndash; everything the inference server needs except Flask. By starting here instead of a generic Ubuntu base, you eliminate the need to compile ARM64 Python wheels, which is the most common failure point when building ML containers for Jetson hardware.\nThe r32.7.1 tag is compatible with r32.7.6 on your nodes. Minor point versions within R32.7.x share the same ABI.\nStep 2 \u0026ndash; Writing the Dockerfile Create the project directory on node 1 and write the Dockerfile:\nmkdir -p ~/dsdl-nano cat \u0026gt; ~/dsdl-nano/Dockerfile \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; FROM nvcr.io/nvidia/l4t-ml:r32.7.1-py3 # Flask 1.x required -- Flask 2+ dropped Python 3.6 support # Werkzeug 1.x pinned to match Flask 1.x\u0026#39;s API expectations RUN pip3 install --no-cache-dir \\ flask==1.1.4 \\ Werkzeug==1.0.1 WORKDIR /app # CRITICAL: create these directories explicitly before any volume mounts # Without this, Docker creates directory placeholders instead of file mounts # causing IsADirectoryError when Flask tries to load the certificate RUN mkdir -p /app/certs /app/model COPY app.py . EXPOSE 8501 # Use HTTPS in healthcheck since container serves TLS when certs are mounted # The -k flag bypasses cert verification -- container lacks the lab CA HEALTHCHECK --interval=15s --timeout=5s --start-period=30s --retries=3 \\ CMD curl -sfk https://localhost:8501/ || exit 1 CMD [\u0026#34;python3\u0026#34;, \u0026#34;app.py\u0026#34;] EOF Two Dockerfile decisions deserve explanation. Flask is pinned to 1.1.4 and Werkzeug to 1.0.1 because Flask 2.0 dropped support for Python 3.6, and Werkzeug 2.0 introduced breaking API changes that Flask 1.x cannot handle. Both pins are hard requirements for this hardware.\nThe RUN mkdir -p /app/certs /app/model line is not optional. When Docker mounts a file into a container using a volume bind (-v host_file:/container/path), if the parent directory does not exist in the image, Docker creates a directory at that path instead of a file mount. The result is that ssl_ctx.load_cert_chain(certfile=\u0026quot;/app/certs/chain.pem\u0026quot;) fails with IsADirectoryError: [Errno 21] Is a directory because /app/certs/chain.pem is a directory, not a file. Creating the directories explicitly in the image prevents this.\nStep 3 \u0026ndash; Writing the DSDL-Native Inference Server The Flask application is the most critical piece of this build. Most DSDL container examples online implement a custom JSON protocol with columns and data arrays. That is wrong for DSDL 5.2.x. Reading DSDL\u0026rsquo;s source code at $SPLUNK_HOME/etc/apps/mltk-container/bin/mltkc/MLTKContainer.py reveals the actual protocol:\nDSDL sends data as a CSV string inside a JSON wrapper:\n{ \u0026#34;data\u0026#34;: \u0026#34;bytes_in,bytes_out\\n100,200\\n300,400\\n\u0026#34;, \u0026#34;meta\u0026#34;: { \u0026#34;options\u0026#34;: {\u0026#34;algo\u0026#34;: \u0026#34;isolation_forest\u0026#34;}, \u0026#34;feature_variables\u0026#34;: [\u0026#34;bytes_in\u0026#34;, \u0026#34;bytes_out\u0026#34;] } } DSDL expects results back as a CSV string in a results key:\n{ \u0026#34;status\u0026#34;: \u0026#34;success\u0026#34;, \u0026#34;message\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;results\u0026#34;: \u0026#34;anomaly_score,is_anomaly,anomaly_label\\n0.23,0,normal\\n0.91,1,ANOMALY\\n\u0026#34; } DSDL then reads the results with pd.read_csv(StringIO(result[\u0026quot;results\u0026quot;])) and merges them with the original dataframe. Any other response format causes a silent JSON parse failure that appears in Splunk as unable to read JSON response.\nCreate the application file:\ncat \u0026gt; ~/dsdl-nano/app.py \u0026lt;\u0026lt; \u0026#39;APPEOF\u0026#39; import json, os, sys, logging, traceback from io import StringIO import numpy as np import pandas as pd import joblib from flask import Flask, request, Response logging.basicConfig(level=logging.INFO, format=\u0026#34;%(asctime)s [%(levelname)s] %(message)s\u0026#34;, stream=sys.stdout) log = logging.getLogger(__name__) app = Flask(__name__) MODEL_PATH = \u0026#34;/app/model/model.pkl\u0026#34; META_PATH = \u0026#34;/app/model/model_meta.json\u0026#34; model = None model_meta = {} SPLUNK_META = {\u0026#34;_time\u0026#34;,\u0026#34;_raw\u0026#34;,\u0026#34;host\u0026#34;,\u0026#34;source\u0026#34;,\u0026#34;sourcetype\u0026#34;,\u0026#34;index\u0026#34;, \u0026#34;splunk_server\u0026#34;,\u0026#34;_bkt\u0026#34;,\u0026#34;_cd\u0026#34;,\u0026#34;_si\u0026#34;} def load_model(): global model, model_meta if os.path.exists(MODEL_PATH): model = joblib.load(MODEL_PATH) if os.path.exists(META_PATH): with open(META_PATH) as f: model_meta = json.load(f) log.info(\u0026#34;Model loaded. Features: %s\u0026#34;, model_meta.get(\u0026#34;features\u0026#34;)) def parse_payload(payload): data_csv = payload.get(\u0026#34;data\u0026#34;, \u0026#34;\u0026#34;) meta = payload.get(\u0026#34;meta\u0026#34;, {}) options = meta.get(\u0026#34;options\u0026#34;, {}) fvars = meta.get(\u0026#34;feature_variables\u0026#34;, []) df = pd.read_csv(StringIO(data_csv)) return df, options, fvars def get_features(df, fvars=None): candidates = fvars if fvars else [ c for c in df.columns if c not in SPLUNK_META and c not in (\u0026#34;anomaly_score\u0026#34;,\u0026#34;is_anomaly\u0026#34;,\u0026#34;anomaly_label\u0026#34;)] return [c for c in candidates if c in df.columns and pd.to_numeric(df[c], errors=\u0026#34;coerce\u0026#34;).notna().any()] def dsdl_ok(results_df, message=\u0026#34;\u0026#34;): return Response(json.dumps({ \u0026#34;status\u0026#34;: \u0026#34;success\u0026#34;, \u0026#34;message\u0026#34;: message, \u0026#34;results\u0026#34;: results_df.to_csv(index=False) }), status=200, mimetype=\u0026#34;application/json\u0026#34;) def dsdl_err(msg, code=500): return Response(json.dumps({ \u0026#34;status\u0026#34;: \u0026#34;error\u0026#34;, \u0026#34;message\u0026#34;: msg, \u0026#34;results\u0026#34;: \u0026#34;\u0026#34; }), status=code, mimetype=\u0026#34;application/json\u0026#34;) @app.route(\u0026#34;/\u0026#34;) def health(): return {\u0026#34;status\u0026#34;:\u0026#34;ok\u0026#34;,\u0026#34;model\u0026#34;:\u0026#34;isolation_forest\u0026#34;, \u0026#34;model_loaded\u0026#34;: model is not None, \u0026#34;features\u0026#34;: model_meta.get(\u0026#34;features\u0026#34;,[]), \u0026#34;l4t\u0026#34;:\u0026#34;r32.7.6\u0026#34;,\u0026#34;python\u0026#34;:sys.version.split()[0], \u0026#34;protocol\u0026#34;:\u0026#34;dsdl-native\u0026#34;} @app.route(\u0026#34;/fit\u0026#34;, methods=[\u0026#34;POST\u0026#34;]) def fit(): global model, model_meta try: payload = request.get_json(force=True) df, opts, fv = parse_payload(payload) features = get_features(df, fv) if not features: return dsdl_err(\u0026#34;No numeric feature columns found\u0026#34;, 400) X = df[features].fillna(0).astype(float).values log.info(\u0026#34;/fit -- %d samples, features: %s\u0026#34;, len(X), features) from sklearn.ensemble import IsolationForest m = IsolationForest(contamination=float(opts.get(\u0026#34;contamination\u0026#34;,0.05)), n_estimators=int(opts.get(\u0026#34;n_estimators\u0026#34;,100)), random_state=42, n_jobs=-1) m.fit(X) os.makedirs(\u0026#34;/app/model\u0026#34;, exist_ok=True) joblib.dump(m, MODEL_PATH) model_meta = {\u0026#34;features\u0026#34;: features, \u0026#34;n_samples\u0026#34;: len(X)} with open(META_PATH, \u0026#34;w\u0026#34;) as f: json.dump(model_meta, f) model = m raw = m.score_samples(X) preds = m.predict(X) norm = (raw - raw.min()) / (raw.max() - raw.min() + 1e-9) scores = np.round(1.0 - norm, 4) results = pd.DataFrame({ \u0026#34;anomaly_score\u0026#34;: scores, \u0026#34;is_anomaly\u0026#34;: (preds==-1).astype(int), \u0026#34;anomaly_label\u0026#34;: [\u0026#34;ANOMALY\u0026#34; if p==-1 else \u0026#34;normal\u0026#34; for p in preds] }) return dsdl_ok(results, \u0026#34;Isolation Forest trained on %d samples\u0026#34; % len(X)) except Exception as e: log.error(\u0026#34;/fit error: %s\u0026#34;, traceback.format_exc()) return dsdl_err(str(e)) @app.route(\u0026#34;/apply\u0026#34;, methods=[\u0026#34;POST\u0026#34;]) def apply(): if model is None: return dsdl_err(\u0026#34;No model loaded. Run fit first.\u0026#34;, 500) try: payload = request.get_json(force=True) df, opts, fv = parse_payload(payload) features = get_features(df, model_meta.get(\u0026#34;features\u0026#34;) or fv) if not features: return dsdl_err(\u0026#34;No numeric feature columns found\u0026#34;, 400) X = df[features].fillna(0).astype(float).values log.info(\u0026#34;/apply -- %d samples\u0026#34;, len(X)) raw = model.score_samples(X) preds = model.predict(X) norm = (raw - raw.min()) / (raw.max() - raw.min() + 1e-9) scores = np.round(1.0 - norm, 4) results = pd.DataFrame({ \u0026#34;anomaly_score\u0026#34;: scores, \u0026#34;is_anomaly\u0026#34;: (preds==-1).astype(int), \u0026#34;anomaly_label\u0026#34;: [\u0026#34;ANOMALY\u0026#34; if p==-1 else \u0026#34;normal\u0026#34; for p in preds] }) anomalies = df.copy() anomalies[\u0026#34;anomaly_score\u0026#34;] = scores anomalies[\u0026#34;is_anomaly\u0026#34;] = (preds==-1).astype(int) rows = anomalies[anomalies[\u0026#34;is_anomaly\u0026#34;]==1].to_dict(orient=\u0026#34;records\u0026#34;) if rows: import threading threading.Thread(target=send_hec, args=(rows,), daemon=True).start() return dsdl_ok(results, \u0026#34;%d anomalies in %d samples\u0026#34; % (int((preds==-1).sum()), len(X))) except Exception as e: log.error(\u0026#34;/apply error: %s\u0026#34;, traceback.format_exc()) return dsdl_err(str(e)) def send_hec(events): import requests as r, random urls = [u.strip() for u in os.environ.get(\u0026#34;SPLUNK_HEC_URLS\u0026#34;,\u0026#34;\u0026#34;).split(\u0026#34;,\u0026#34;) if u.strip()] token = os.environ.get(\u0026#34;SPLUNK_HEC_TOKEN\u0026#34;,\u0026#34;\u0026#34;) index = os.environ.get(\u0026#34;SPLUNK_HEC_INDEX\u0026#34;,\u0026#34;ai_inference\u0026#34;) host = os.environ.get(\u0026#34;SPLUNK_HEC_HOST\u0026#34;,\u0026#34;jetson-nano\u0026#34;) verify = os.environ.get(\u0026#34;SPLUNK_HEC_SSL_VERIFY\u0026#34;,\u0026#34;false\u0026#34;).lower() != \u0026#34;false\u0026#34; if not urls or not token: return payload = \u0026#34;\u0026#34;.join(json.dumps({ \u0026#34;sourcetype\u0026#34;:\u0026#34;jetson_inference\u0026#34;,\u0026#34;index\u0026#34;:index, \u0026#34;host\u0026#34;:host,\u0026#34;event\u0026#34;:e}) for e in events) try: r.post(random.choice(urls), headers={\u0026#34;Authorization\u0026#34;:\u0026#34;Splunk \u0026#34;+token}, data=payload, verify=verify, timeout=5) except Exception as ex: log.warning(\u0026#34;HEC push failed: %s\u0026#34;, ex) if __name__ == \u0026#34;__main__\u0026#34;: import ssl as ssl_lib load_model() log.info(\u0026#34;DSDL inference server starting on port 8501\u0026#34;) CERT = \u0026#34;/app/certs/chain.pem\u0026#34; KEY = \u0026#34;/app/certs/node.key\u0026#34; if os.path.exists(CERT) and os.path.exists(KEY): ctx = ssl_lib.SSLContext(ssl_lib.PROTOCOL_TLS_SERVER) ctx.load_cert_chain(certfile=CERT, keyfile=KEY) log.info(\u0026#34;TLS enabled -- serving HTTPS on port 8501\u0026#34;) app.run(host=\u0026#34;0.0.0.0\u0026#34;, port=8501, ssl_context=ctx, debug=False, threaded=True) else: log.warning(\u0026#34;No certs found -- falling back to HTTP\u0026#34;) app.run(host=\u0026#34;0.0.0.0\u0026#34;, port=8501, debug=False, threaded=True) APPEOF Note that all unicode characters are intentionally avoided in this file. Python 3.6 inside a Docker container with no locale configured defaults to ASCII stdout encoding. Any character above U+007F in a print statement or log message will raise UnicodeEncodeError at startup and crash the container. Use plain ASCII throughout.\nStep 4 \u0026ndash; Generating TLS Certificates DSDL requires HTTPS for the container endpoint. You will generate a lab CA and sign a certificate for each node. Run these commands on the machine where you will manage certificates (the search head or a dedicated admin machine). The commands use only standard OpenSSL available on any Linux system.\nmkdir -p ~/lab-pki \u0026amp;\u0026amp; cd ~/lab-pki # Generate lab CA -- 4096-bit RSA, 10-year validity for lab use openssl genrsa -out lab-ca.key 4096 openssl req -new -x509 -days 3650 -key lab-ca.key -out lab-ca.crt \\ -subj \u0026#39;/C=US/ST=Lab/L=Lab/O=SecurityLab/OU=InferenceLab/CN=SecurityLab-RootCA\u0026#39; Generate a certificate for each node. The Subject Alternative Name (SAN) block is required \u0026ndash; modern TLS rejects certificates that only have the hostname in CN. You must include both the hostname and IP in the SAN or DSDL\u0026rsquo;s SSL verification will fail:\n# Repeat for each node -- change NODE_NAME and NODE_IP NODE_NAME=\u0026#39;k8clstr01cm\u0026#39; NODE_IP=\u0026#39;10.1.30.23\u0026#39; openssl genrsa -out ${NODE_NAME}.key 2048 cat \u0026gt; ${NODE_NAME}-csr.cnf \u0026lt;\u0026lt; EOF [req] default_bits = 2048 distinguished_name = req_distinguished_name req_extensions = req_ext prompt = no [req_distinguished_name] CN = ${NODE_NAME}.test.lab [req_ext] subjectAltName = @alt_names basicConstraints = CA:FALSE keyUsage = digitalSignature, keyEncipherment extendedKeyUsage = serverAuth [alt_names] DNS.1 = ${NODE_NAME}.test.lab DNS.2 = ${NODE_NAME} IP.1 = ${NODE_IP} EOF openssl req -new -key ${NODE_NAME}.key -out ${NODE_NAME}.csr \\ -config ${NODE_NAME}-csr.cnf openssl x509 -req -days 730 -in ${NODE_NAME}.csr \\ -CA lab-ca.crt -CAkey lab-ca.key -CAcreateserial \\ -out ${NODE_NAME}.crt \\ -extensions req_ext -extfile ${NODE_NAME}-csr.cnf # Full chain: node cert + CA cert -- Flask needs both in one file cat ${NODE_NAME}.crt lab-ca.crt \u0026gt; ${NODE_NAME}-chain.pem # Verify openssl verify -CAfile lab-ca.crt ${NODE_NAME}.crt Distribute the certificates to node 1:\nssh k8admin@10.1.30.23 \u0026#39;mkdir -p ~/dsdl-nano/certs \u0026amp;\u0026amp; chmod 700 ~/dsdl-nano/certs\u0026#39; scp ~/lab-pki/k8clstr01cm.key k8admin@10.1.30.23:~/dsdl-nano/certs/ scp ~/lab-pki/k8clstr01cm-chain.pem k8admin@10.1.30.23:~/dsdl-nano/certs/ Step 5 \u0026ndash; Building the Docker Image Build on node 1. The --no-cache flag ensures Docker pulls fresh layers and does not reuse a stale cached layer with incorrect directory structure:\n# On k8clstr01cm (10.1.30.23) cd ~/dsdl-nano docker build --no-cache -t dsdl-nano:1.0 . The build will take 3-5 minutes. Most of the time is Flask installation \u0026ndash; the l4t-ml base image already has everything else.\nVerify the image before starting the container:\ndocker images | grep dsdl-nano # Should show: dsdl-nano 1.0 \u0026lt;id\u0026gt; \u0026lt;time\u0026gt; ~2.1GB Step 6 \u0026ndash; Starting the Inference Container Start the container on node 1 with all required environment variables and volume mounts. Replace INDEXER1_IP and INDEXER2_IP with your actual Splunk indexer IPs:\ndocker run -d \\ --name dsdl-inference \\ --runtime=nvidia \\ --restart unless-stopped \\ -p 8501:8501 \\ -e SPLUNK_HEC_URLS=\u0026#39;https://INDEXER1_IP:8088/services/collector,https://INDEXER2_IP:8088/services/collector\u0026#39; \\ -e SPLUNK_HEC_TOKEN=\u0026#39;your-hec-token-guid\u0026#39; \\ -e SPLUNK_HEC_INDEX=\u0026#39;ai_inference\u0026#39; \\ -e SPLUNK_HEC_HOST=\u0026#39;k8clstr01cm\u0026#39; \\ -e SPLUNK_HEC_SSL_VERIFY=\u0026#39;false\u0026#39; \\ -v ~/dsdl-nano/model:/app/model \\ -v ~/dsdl-nano/certs/k8clstr01cm-chain.pem:/app/certs/chain.pem:ro \\ -v ~/dsdl-nano/certs/k8clstr01cm.key:/app/certs/node.key:ro \\ dsdl-nano:1.0 The SPLUNK_HEC_SSL_VERIFY=false setting is correct for the Nanos. They do not have Splunk\u0026rsquo;s internal certificate authority in their trust stores, so they cannot verify the indexers\u0026rsquo; TLS certificates. This is acceptable for a lab environment on an isolated network.\nWait 10 seconds for startup, then verify:\nsleep 10 docker logs dsdl-inference | grep -E \u0026#34;TLS|HTTPS|warn|Error\u0026#34; # Expected: TLS enabled -- serving HTTPS on port 8501 Step 7 \u0026ndash; Distributing to Remaining Nodes Build once, distribute to all. Export the finished image from node 1 and copy to the other three nodes:\n# On node 1 -- export the image docker save dsdl-nano:1.0 | gzip \u0026gt; /tmp/dsdl-nano.tar.gz # Copy to remaining nodes for NODE_IP in 10.1.30.24 10.1.30.25 10.1.30.26; do echo \u0026#34;=== Copying to ${NODE_IP} ===\u0026#34; scp /tmp/dsdl-nano.tar.gz k8admin@${NODE_IP}:/tmp/ done On each remaining node, load the image and start the container with the node-specific values:\n# On k8clstr01wk01 (10.1.30.24) -- example docker load \u0026lt; /tmp/dsdl-nano.tar.gz docker run -d \\ --name dsdl-inference \\ --runtime=nvidia \\ --restart unless-stopped \\ -p 8501:8501 \\ -e SPLUNK_HEC_URLS=\u0026#39;https://INDEXER1_IP:8088/services/collector,https://INDEXER2_IP:8088/services/collector\u0026#39; \\ -e SPLUNK_HEC_TOKEN=\u0026#39;your-hec-token-guid\u0026#39; \\ -e SPLUNK_HEC_INDEX=\u0026#39;ai_inference\u0026#39; \\ -e SPLUNK_HEC_HOST=\u0026#39;k8clstr01wk01\u0026#39; \\ -e SPLUNK_HEC_SSL_VERIFY=\u0026#39;false\u0026#39; \\ -v ~/dsdl-nano/model:/app/model \\ -v ~/dsdl-nano/certs/k8clstr01wk01-chain.pem:/app/certs/chain.pem:ro \\ -v ~/dsdl-nano/certs/k8clstr01wk01.key:/app/certs/node.key:ro \\ dsdl-nano:1.0 Only two values change per node: SPLUNK_HEC_HOST and the cert filenames in the volume mounts.\nStep 8 \u0026ndash; Verifying All Four Nodes Run this from your admin machine to verify all four containers are healthy and serving HTTPS:\nfor NODE_IP in 10.1.30.23 10.1.30.24 10.1.30.25 10.1.30.26; do echo -n \u0026#34;${NODE_IP}: \u0026#34; curl -s --cacert ~/lab-pki/lab-ca.crt https://${NODE_IP}:8501/ \\ | python3 -c \u0026#34;import sys,json; d=json.load(sys.stdin); print(\u0026#39;status=%s model_loaded=%s protocol=%s\u0026#39; % ( d[\u0026#39;status\u0026#39;], d[\u0026#39;model_loaded\u0026#39;], d[\u0026#39;protocol\u0026#39;]))\u0026#34; \\ 2\u0026gt;/dev/null || echo \u0026#34;FAILED\u0026#34; done All four should return status=ok model_loaded=True protocol=dsdl-native. If any node returns FAILED, check docker logs dsdl-inference on that node for the specific error.\nConclusion You now have four HTTPS inference containers running on Jetson Nano hardware, each implementing DSDL\u0026rsquo;s native CSV wire protocol and ready to receive fit and apply requests from Splunk. The key decisions that make this work are: the l4t-ml base image eliminates ARM64 compilation problems, the explicit mkdir in the Dockerfile prevents Docker\u0026rsquo;s directory-placeholder bug, Flask 1.x handles Python 3.6, the DSDL-native CSV protocol matches what DSDL actually sends rather than what examples show, and the chain PEM file includes the full certificate chain for Flask\u0026rsquo;s TLS context.\nIn Part 3, you will wire this container infrastructure to Splunk \u0026ndash; HEC configuration on the indexer cluster, DSDL setup on the search head, the critical docker.conf and containers.conf configuration that DSDL actually uses, and the end-to-end smoke tests that confirm the full pipeline works.\nPart 1: Architecture and Concepts Part 3: Wiring the Pipeline \u0026ndash; DSDL Configuration, HEC, and Splunk Integration Part 4: Real Security Data \u0026ndash; Training and Deploying Anomaly Detection Models\n","permalink":"https://telemetry-forge.t-security.org/posts/edge-ai-secops-part2/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn Part 1, you learned why an edge AI inference pipeline is a compelling approach for security operations, and you understood the key constraints imposed by the Jetson Nano 4GB hardware and DSDL 5.2.3\u0026rsquo;s actual behavior. In this post, you will build the inference container \u0026ndash; every file, every command, and every decision explained.\u003c/p\u003e\n\u003cp\u003eThe build covers four areas: the Dockerfile and its critical dependencies, the DSDL-native Flask application that implements the correct wire protocol, TLS certificate generation using a lab CA, and the distribution workflow for deploying the finished image to all four nodes.\u003c/p\u003e","title":"Building the DSDL-Native Inference Container on Jetson Nano (Part 2 of 4)"},{"content":"Introduction In Part 2, you built four HTTPS inference containers running on Jetson Nano hardware. They are healthy, serving the DSDL-native protocol, and waiting for requests. In this post, you will wire everything together: Splunk DSDL installed on the search head, HEC configured on the indexer cluster, and the exact configuration files that make DSDL\u0026rsquo;s fit and apply commands route correctly to your containers.\nThis is the most configuration-dense part of the series. It is also where most implementations break down \u0026ndash; not because the concepts are complex, but because DSDL 5.2.3 has several undocumented behaviors that only become visible when you read its Python source code. This post documents those behaviors explicitly so you do not have to discover them through trial and error.\nPrerequisites Splunk Enterprise 10.0 with Enterprise Security installed Python for Scientific Computing add-on (Splunk_SA_Scientific_Python_linux_x86_64) installed AI Toolkit / MLTK (Splunk_ML_Toolkit 5.6.4 or later) installed and enabled All four Nano inference containers running and verified from Part 2 A Splunk indexer cluster managed by a cluster manager node Lab CA certificate at ~/lab-pki/lab-ca.crt on the search head Step 1 \u0026ndash; Verifying Prerequisites Before Installing DSDL DSDL must be installed last. Installing it before MLTK or PSC causes dependency failures that require removing apps to fix. Verify both are present:\n| rest /services/apps/local | search title IN (\u0026#34;Splunk_ML_Toolkit\u0026#34;, \u0026#34;Splunk_SA_Scientific_Python_linux_x86_64\u0026#34;) | table title, version, disabled Both apps should appear with disabled blank (meaning enabled). Also verify MLTK permissions are set to Global \u0026ndash; DSDL cannot find MLTK commands if they are scoped to a single app:\nNavigate to Apps \u0026gt; Manage Apps Find Machine Learning Toolkit and click Permissions Confirm Object should appear in is set to All Apps If you are running Splunk Enterprise Security, this is almost certainly already set correctly since ES requires it Step 2 \u0026ndash; Configuring HEC on the Indexer Cluster In a Splunk indexer cluster, HEC configuration must be distributed via the cluster manager bundle. Do not configure HEC through the Splunk web UI on individual indexers \u0026ndash; you will end up with inconsistent token configurations across peers.\nCreate the HEC configuration app on the cluster manager. The ta_jetson_hec app contains three files: inputs.conf for the HEC token, indexes.conf to create the target index, and props.conf for the jetson_inference sourcetype:\n# On the cluster manager mkdir -p $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/{default,local,metadata} # inputs.conf -- HEC global settings and shared token cat \u0026gt; $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/local/inputs.conf \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; [http] disabled=0 enableSSL=1 port=8088 dedicatedIoThreads=2 maxSockets=0 maxThreads=0 useDeploymentServer=0 [http://jetson-inference-nodes] disabled=0 token=21a7297a-b7fb-4209-b719-72c4fb58f38d index=ai_inference indexes=ai_inference sourcetype=jetson_inference EOF # indexes.conf -- create the target index on all peers cat \u0026gt; $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/local/indexes.conf \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; [ai_inference] disabled=false homePath = $SPLUNK_DB/ai_inference/db coldPath = $SPLUNK_DB/ai_inference/colddb thawedPath = $SPLUNK_DB/ai_inference/thaweddb maxTotalDataSizeMB=10240 frozenTimePeriodInSecs=7776000 EOF # app.conf cat \u0026gt; $SPLUNK_HOME/etc/master-apps/ta_jetson_hec/default/app.conf \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; [launcher] author=SecurityLab version=1.0.0 description=HEC receiver for Jetson Nano DSDL inference nodes [package] id=ta_jetson_hec [ui] is_visible=false label=TA Jetson HEC EOF Validate and push the bundle:\n# Validate -- confirm no config errors before pushing splunk validate cluster-bundle --check-restart # Push to all peers splunk apply cluster-bundle Also copy the app to the search head so its props.conf definitions are available at search time. The search head will ignore the HEC [http] stanzas \u0026ndash; only indexer peers act on those:\ncp -r $SPLUNK_HOME/etc/master-apps/ta_jetson_hec \\ $SPLUNK_HOME/etc/apps/ta_jetson_hec Verify HEC is working from each Nano. Run this from node 1 and replace INDEXER1_IP with your actual indexer IP:\ncurl -k https://INDEXER1_IP:8088/services/collector \\ -H \u0026#39;Authorization: Splunk 21a7297a-b7fb-4209-b719-72c4fb58f38d\u0026#39; \\ -H \u0026#39;Content-Type: application/json\u0026#39; \\ -d \u0026#39;{\u0026#34;sourcetype\u0026#34;:\u0026#34;jetson_inference\u0026#34;,\u0026#34;index\u0026#34;:\u0026#34;ai_inference\u0026#34;, \u0026#34;host\u0026#34;:\u0026#34;k8clstr01cm\u0026#34;,\u0026#34;event\u0026#34;:{\u0026#34;test\u0026#34;:\u0026#34;hec_ok\u0026#34;}}\u0026#39; # Expected: {\u0026#34;text\u0026#34;:\u0026#34;Success\u0026#34;,\u0026#34;code\u0026#34;:0} If you get No route to host, the host-based firewall on the indexer is blocking port 8088. On each indexer:\nsudo iptables -I INPUT -p tcp --dport 8088 -j ACCEPT Step 3 \u0026ndash; Installing DSDL on the Search Head Install DSDL from Splunkbase (app ID 4607). If the search head cannot reach Splunkbase directly, download it on a machine with internet access and install via CLI:\n$SPLUNK_HOME/bin/splunk install app \\ /tmp/splunk-app-for-data-science-and-deep-learning_523.tgz \\ -auth admin:YOUR_PASSWORD When prompted, restart Splunk. On a search head running Enterprise Security, restart takes 2-4 minutes including ES initialization. Do not interact with DSDL until ES Incident Review loads normally \u0026ndash; attempting DSDL configuration while ES is still initializing causes false Test \u0026amp; Save failures.\nStep 4 \u0026ndash; Preparing the Lab CA for DSDL This is the step most implementations miss. DSDL\u0026rsquo;s endpoint() method uses Python\u0026rsquo;s urllib library (not requests) with a dynamically constructed SSL context. The relevant code in MLTKContainer.py:\nserver_cert = ssl.get_server_certificate((url_parsed.hostname, url_parsed.port)) ssl_context = ssl.create_default_context(cadata=server_cert) ssl.get_server_certificate() fetches only the leaf certificate \u0026ndash; it does not fetch the CA certificate. When DSDL tries to verify the chain with ssl.create_default_context(cadata=server_cert), it has the Nano\u0026rsquo;s certificate but not the CA that signed it. This produces the error SSL CERTIFICATE_VERIFY_FAILED: self-signed certificate in certificate chain \u0026ndash; the same error you get when you run curl without -k against a self-signed cert.\nThe fix is to tell DSDL to use the CA file directly instead of the dynamically fetched cert. DSDL already supports this through the endpoint_cert_filename_or_path key in docker.conf:\n# In MLTKContainer.py endpoint_cert_filename_or_path handling: if cert_file_or_path: ssl_context = ssl.create_default_context(cafile=endpoint_cert_filename_or_path) Copy the lab CA to the DSDL app directory:\ncp ~/lab-pki/lab-ca.crt \\ $SPLUNK_HOME/etc/apps/mltk-container/local/lab-ca.crt Step 5 \u0026ndash; Configuring DSDL via the Setup UI (Environment 1) Navigate to DSDL app \u0026gt; Configuration \u0026gt; Setup. Fill in the Docker Settings for Environment 1:\nField Value Docker Host tcp://10.1.30.23:2375 Endpoint URL https://10.1.30.23:8501 External URL https://10.1.30.23:8501 Leave the Splunk Docker Logging fields blank. The Jupyter Notebook settings are entirely optional for this use case \u0026ndash; skip them. Click Test \u0026amp; Save. A green checkmark confirms Splunk can reach the container on both port 2375 (Docker API) and port 8501 (inference endpoint).\nThe Setup UI only configures Environment 1. Environment 2 is added directly to docker.conf.\nStep 6 \u0026ndash; Writing the Complete docker.conf This is the most important configuration file. The Splunk config layering system merges default/docker.conf (which has blank values for all keys) with local/docker.conf. If your local file only has the keys you care about, blank values from default win for the rest. You must explicitly define every key that appears in default/docker.conf to ensure local values override default values.\nWrite the complete file:\ncat \u0026gt; $SPLUNK_HOME/etc/apps/mltk-container/local/docker.conf \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; [connection] api_token = WSDG3A5R94Z0D822K6FWRO6AXZAQ0F36NCI6RELCMKGO7HRZ03HRME7B2BJ67MWH api_workers = 1 container_enable_https = 1 container_enable_keepalive = 0 docker_url = tcp://10.1.30.23:2375 docker_network = # This is the fix for DSDL\u0026#39;s SSL verification with self-signed certs # DSDL reads this and builds ssl.create_default_context(cafile=\u0026lt;path\u0026gt;) # which correctly includes the CA chain rather than just the leaf cert endpoint_cert_filename_or_path = /home/splunk/etc/apps/mltk-container/local/lab-ca.crt endpoint_cert_check_hostname = 0 endpoint_hostname = https://10.1.30.23:8501 endpoint_hostname_external = https://10.1.30.23:8501 docker_logging_endpoint_hostname = docker_logging_splunk_token = image_pull_secrets = None in_cluster_mode = false is_configured_complete = 1 olly_enabled = 0 olly_splunk_access_token = olly_otel_endpoint = olly_otel_service_name = splunk_access_enabled = 0 splunk_access_token = splunk_access_host = splunk_access_port = splunk_hec_enabled = 0 splunk_hec_token = splunk_hec_url = [connection2] api_token = WSDG3A5R94Z0D822K6FWRO6AXZAQ0F36NCI6RELCMKGO7HRZ03HRME7B2BJ67MWH api_workers = 1 container_enable_https = 1 container_enable_keepalive = 0 docker_url = tcp://10.1.30.24:2375 docker_network = endpoint_cert_filename_or_path = /home/splunk/etc/apps/mltk-container/local/lab-ca.crt endpoint_cert_check_hostname = 0 endpoint_hostname = https://10.1.30.24:8501 endpoint_hostname_external = https://10.1.30.24:8501 docker_logging_endpoint_hostname = docker_logging_splunk_token = image_pull_secrets = None in_cluster_mode = false is_configured_complete = 1 olly_enabled = 0 olly_splunk_access_token = olly_otel_endpoint = olly_otel_service_name = splunk_access_enabled = 0 splunk_access_token = splunk_access_host = splunk_access_port = splunk_hec_enabled = 0 splunk_hec_token = splunk_hec_url = EOF The api_token value is DSDL\u0026rsquo;s own internal authorization token \u0026ndash; it is generated when you run the Setup UI and is used as an Authorization header in every request DSDL sends to your container. Your container app.py does not need to validate this token (the DSDL-native container examples do not either), but DSDL will not send requests without it.\nStep 7 \u0026ndash; Writing containers.conf DSDL uses containers.conf to map model names to container endpoints. This file is separate from docker.conf and is not configured through the Setup UI. Without a [__dev__] stanza, every fit and apply call fails with no config found for model name -- switching to default __dev__ container followed by a blank endpoint error.\nAdd a stanza for every model you plan to train before running the fit command:\ncat \u0026gt; $SPLUNK_HOME/etc/apps/mltk-container/local/containers.conf \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; [default] [__dev__] api_url = https://10.1.30.23:8501 api_url_external = https://10.1.30.23:8501 environment = connection [smoke_test_model] api_url = https://10.1.30.23:8501 api_url_external = https://10.1.30.23:8501 environment = connection [zeek_net_anomaly] api_url = https://10.1.30.23:8501 api_url_external = https://10.1.30.23:8501 environment = connection [dns_tunnel_anomaly] api_url = https://10.1.30.24:8501 api_url_external = https://10.1.30.24:8501 environment = connection2 [auth_anomaly] api_url = https://10.1.30.24:8501 api_url_external = https://10.1.30.24:8501 environment = connection2 EOF Restart Splunk to pick up all configuration changes:\n$SPLUNK_HOME/bin/splunk restart Step 8 \u0026ndash; End-to-End Smoke Tests Run all five tests in order. Each validates a layer the next test depends on.\nTest 1 \u0026ndash; DSDL urllib SSL connection (simulates exactly what DSDL does):\n$SPLUNK_HOME/bin/splunk cmd python3 \u0026lt;\u0026lt; \u0026#39;PYEOF\u0026#39; import urllib.request as urllib_request, ssl, json url = \u0026#39;https://10.1.30.23:8501/fit\u0026#39; api_token = \u0026#39;WSDG3A5R94Z0D822K6FWRO6AXZAQ0F36NCI6RELCMKGO7HRZ03HRME7B2BJ67MWH\u0026#39; ca_file = \u0026#39;/home/splunk/etc/apps/mltk-container/local/lab-ca.crt\u0026#39; ssl_ctx = ssl.create_default_context(cafile=ca_file) ssl_ctx.check_hostname = False payload = {\u0026#34;data\u0026#34;: \u0026#34;bytes_in,bytes_out\\n100,200\\n300,400\\n\u0026#34;, \u0026#34;meta\u0026#34;: {\u0026#34;options\u0026#34;: {}, \u0026#34;feature_variables\u0026#34;: [\u0026#34;bytes_in\u0026#34;,\u0026#34;bytes_out\u0026#34;]}} data_encoded = str.encode(json.dumps(payload)) header = {\u0026#39;Authorization\u0026#39;: api_token, \u0026#39;Content-Type\u0026#39;: \u0026#39;application/json\u0026#39;} req = urllib_request.Request(url, data_encoded, header) try: response = urllib_request.urlopen(req, context=ssl_ctx) parsed = json.loads(response.read()) print(\u0026#39;SUCCESS -- status:\u0026#39;, parsed.get(\u0026#39;status\u0026#39;)) except Exception as e: print(\u0026#39;FAILED:\u0026#39;, str(e)) PYEOF Expected: SUCCESS -- status: success\nTest 2 \u0026ndash; Splunk fit command:\n| makeresults count=50 | eval bytes_in=random()%10000, bytes_out=random()%6000, duration_sec=round(random()%10+0.1,2), packet_count=random()%100, unique_ports=random()%8, failed_logins=random()%5 | fit MLTKContainer algo=isolation_forest bytes_in, bytes_out, duration_sec, packet_count, unique_ports, failed_logins into app:smoke_test_model environment=1 | table bytes_in, bytes_out, anomaly_score, is_anomaly, anomaly_label Expected: 50 rows with anomaly_score, is_anomaly, and anomaly_label populated.\nTest 3 \u0026ndash; Splunk apply command:\n| makeresults count=10 | eval bytes_in=random()%10000, bytes_out=random()%6000, duration_sec=round(random()%10+0.1,2), packet_count=random()%100, unique_ports=random()%8, failed_logins=random()%5 | apply smoke_test_model | table bytes_in, bytes_out, anomaly_score, is_anomaly, anomaly_label Important: the correct apply syntax is | apply model_name \u0026ndash; the model name is the first positional argument. | apply MLTKContainer algo=... is not how MLTK\u0026rsquo;s apply command works. MLTK infers the algorithm from the saved model registry.\nTest 4 \u0026ndash; HEC events in Splunk:\nindex=ai_inference sourcetype=jetson_inference earliest=-15m | table _time, host, anomaly_score, is_anomaly, anomaly_label | sort -anomaly_score Expected: rows with host=k8clstr01cm and anomaly scores populated. These events were pushed back to Splunk by the container\u0026rsquo;s background HEC thread when anomalies were detected during the apply call.\nConclusion The pipeline is fully wired. Splunk can train models on the Nano containers, apply those models to live data, and receive anomalous events back via HEC \u0026ndash; all without leaving the Splunk SPL interface.\nThe three DSDL behaviors that are not in the documentation but that you now understand from source code: the CSV wire protocol, the urllib SSL context behavior requiring endpoint_cert_filename_or_path, and the containers.conf model registry that must be pre-populated before fit runs. These three things account for the majority of DSDL deployment failures on non-standard hardware.\nIn Part 4, you will train real models on your actual security data \u0026ndash; Zeek connection logs, Splunk Stream DNS, and Windows Security events \u0026ndash; and build ES correlation rules that generate actionable notable events from AI-scored anomalies.\nPart 1: Architecture and Concepts Part 2: Building the DSDL-Native Inference Container on Jetson Nano Part 4: Real Security Data \u0026ndash; Training and Deploying Anomaly Detection Models\n","permalink":"https://telemetry-forge.t-security.org/posts/edge-ai-secops-part3/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn Part 2, you built four HTTPS inference containers running on Jetson Nano hardware. They are healthy, serving the DSDL-native protocol, and waiting for requests. In this post, you will wire everything together: Splunk DSDL installed on the search head, HEC configured on the indexer cluster, and the exact configuration files that make DSDL\u0026rsquo;s fit and apply commands route correctly to your containers.\u003c/p\u003e\n\u003cp\u003eThis is the most configuration-dense part of the series. It is also where most implementations break down \u0026ndash; not because the concepts are complex, but because DSDL 5.2.3 has several undocumented behaviors that only become visible when you read its Python source code. This post documents those behaviors explicitly so you do not have to discover them through trial and error.\u003c/p\u003e","title":"Wiring the Pipeline: DSDL Configuration, HEC, and Splunk Integration (Part 3 of 4)"},{"content":"Introduction The first three parts of this series built the infrastructure. This final post uses it. You now have four Jetson Nano inference containers running HTTPS, DSDL wired to Splunk, and HEC delivering scored events back to your indexers. In this post, you will train three Isolation Forest models on real security data and build the ES correlation rules that turn AI-scored anomalies into actionable notable events.\nThe three detection use cases are chosen for complementary coverage across the kill chain. Zeek connection log anomaly detection catches behavioral outliers in network traffic \u0026ndash; the kinds of connections that do not match your environment\u0026rsquo;s normal patterns. DNS anomaly detection catches tunneling, command-and-control beaconing, and DGA callbacks that generate no network flow anomalies because they hide inside legitimate DNS traffic. Windows authentication anomaly detection catches credential abuse, brute force, and lateral movement patterns that look like legitimate events when examined individually but are anomalous in aggregate.\nTogether these three models cover T1046, T1071, T1048, T1110, T1078, and T1021 \u0026ndash; a meaningful portion of the techniques that appear in advanced persistent threat activity.\nPrerequisites All four Nano inference containers healthy from Part 3 DSDL fit and apply smoke tests passing from Part 3 Zeek connection logs indexed at index=net_data sourcetype=\u0026quot;bro:conn:json\u0026quot; Splunk Stream DNS data indexed at index=stream sourcetype=\u0026quot;stream:dns\u0026quot; Windows Security event logs indexed at index=wineventlog sourcetype=\u0026quot;WinEventLog:Security\u0026quot; Minimum 7 days of each data source for viable baseline models Step 1 \u0026ndash; Understanding What Isolation Forest Detects Before training, understanding what Isolation Forest actually measures prevents misinterpreting its output.\nIsolation Forest is an unsupervised anomaly detection algorithm. It works by building random trees that partition the feature space. Points that are easy to isolate \u0026ndash; requiring few partitions to separate from the rest \u0026ndash; receive high anomaly scores. Points in dense, predictable regions require many partitions and receive low anomaly scores.\nFor security data, this translates to behavioral outliers: connections with unusual byte ratios, DNS queries with unusually long subdomains, users authenticating from unusual numbers of machines. The model has no concept of malicious versus benign \u0026ndash; it only knows normal versus statistically unusual. Your correlation rules are what assign security meaning to high anomaly scores.\nThe contamination parameter tells the model what fraction of training data to treat as anomalies. Setting it to 0.05 means the model expects 5% of normal traffic to score as anomalous. For a well-tuned model on security data, start at 0.05 and tune based on your actual false positive rate after a week of operation.\nStep 2 \u0026ndash; Job 1: Zeek Connection Log Anomaly Detection Run a field discovery search first to confirm your exact Zeek field names before running the training job:\nindex=net_data sourcetype=\u0026#34;bro:conn:json\u0026#34; earliest=-1h | head 100 | fieldsummary | table field, count, distinct_count | sort -count Confirm these fields exist: duration, orig_bytes, resp_bytes, orig_pkts, resp_pkts, id.resp_p, conn_state.\nThe log transform on byte counts is important and not optional. Raw byte values span six orders of magnitude \u0026ndash; a single large file transfer dwarfs hundreds of small connections in the feature space. Without log scaling the model fixates on the largest values and ignores behavioral anomalies in smaller flows, which is exactly where command-and-control and lateral movement traffic hides.\nFIT \u0026ndash; run once against 7 days of Zeek data:\nindex=net_data sourcetype=\u0026#34;bro:conn:json\u0026#34; earliest=-7d | where isnotnull(duration) AND isnotnull(orig_bytes) AND isnotnull(resp_bytes) | eval conn_state_enc=case( conn_state=\u0026#34;SF\u0026#34;, 1, conn_state=\u0026#34;S0\u0026#34;, 0, conn_state=\u0026#34;REJ\u0026#34;, 2, conn_state=\u0026#34;RSTO\u0026#34;, 3, conn_state=\u0026#34;RSTOS0\u0026#34;, 4, conn_state=\u0026#34;SHR\u0026#34;, 5, conn_state=\u0026#34;OTH\u0026#34;, 6, true(), 7) | eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4) | eval log_duration=if(duration\u0026gt;0,log(duration),0) | eval log_orig_bytes=if(orig_bytes\u0026gt;0,log(orig_bytes+1),0) | eval log_resp_bytes=if(resp_bytes\u0026gt;0,log(resp_bytes+1),0) | table log_duration, log_orig_bytes, log_resp_bytes, orig_pkts, resp_pkts, \u0026#34;id.resp_p\u0026#34;, bytes_ratio, conn_state_enc | fit MLTKContainer algo=isolation_forest log_duration, log_orig_bytes, log_resp_bytes, orig_pkts, resp_pkts, \u0026#34;id.resp_p\u0026#34;, bytes_ratio, conn_state_enc into app:zeek_net_anomaly environment=1 After fit completes, verify the model by checking what scores highest. You should see short-duration zero-byte flows (scans), high destination port numbers, and upload-heavy byte ratios scoring above 0.8:\nindex=net_data sourcetype=\u0026#34;bro:conn:json\u0026#34; earliest=-1h | where isnotnull(duration) AND isnotnull(orig_bytes) | eval conn_state_enc=case(conn_state=\u0026#34;SF\u0026#34;,1,conn_state=\u0026#34;S0\u0026#34;,0,true(),7) | eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4) | eval log_duration=if(duration\u0026gt;0,log(duration),0) | eval log_orig_bytes=if(orig_bytes\u0026gt;0,log(orig_bytes+1),0) | eval log_resp_bytes=if(resp_bytes\u0026gt;0,log(resp_bytes+1),0) | apply zeek_net_anomaly | where anomaly_score \u0026gt; 0.80 | table _time, \u0026#34;id.orig_h\u0026#34;, \u0026#34;id.resp_h\u0026#34;, \u0026#34;id.resp_p\u0026#34;, conn_state, orig_bytes, resp_bytes, duration, anomaly_score, anomaly_label | sort -anomaly_score ES Correlation Rule SPL (create in Security \u0026gt; Content Management \u0026gt; Create New Content \u0026gt; Correlation Search):\nindex=net_data sourcetype=\u0026#34;bro:conn:json\u0026#34; earliest=-15m latest=now | where isnotnull(duration) AND isnotnull(orig_bytes) | eval conn_state_enc=case(conn_state=\u0026#34;SF\u0026#34;,1,conn_state=\u0026#34;S0\u0026#34;,0,true(),7) | eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4) | eval log_duration=if(duration\u0026gt;0,log(duration),0) | eval log_orig_bytes=if(orig_bytes\u0026gt;0,log(orig_bytes+1),0) | eval log_resp_bytes=if(resp_bytes\u0026gt;0,log(resp_bytes+1),0) | apply zeek_net_anomaly | where anomaly_score \u0026gt; 0.80 | eval src=\u0026#34;id.orig_h\u0026#34;, dest=\u0026#34;id.resp_h\u0026#34;, dest_port=\u0026#34;id.resp_p\u0026#34; | eval signature=\u0026#34;AI_NETWORK_BEHAVIOR_ANOMALY\u0026#34; | eval severity=if(anomaly_score\u0026gt;0.90,\u0026#34;critical\u0026#34;,\u0026#34;high\u0026#34;) | table _time, src, dest, dest_port, anomaly_score, severity, signature Schedule this correlation rule to run every 15 minutes. Set the threshold to 0.80 for initial deployment \u0026ndash; tune down to 0.75 after a week of baseline once you understand your false positive rate.\nStep 3 \u0026ndash; Job 2: DNS Tunneling and DGA Detection DNS is the protocol that attackers abuse most reliably because most environments allow it to reach the internet without inspection. DNS tunneling encodes data in query names to exfiltrate information or receive command-and-control instructions. DGA (Domain Generation Algorithm) callbacks generate hundreds of unique domain names that are computationally generated and statistically distinct from human-registered domains.\nThe key features that distinguish tunneling and DGA traffic from normal DNS are query length, entropy, NXDOMAIN rate, and TXT record usage. Tunneling tools base64-encode payloads in query names making them unusually long. DGA names have high character entropy because they are pseudorandom. High NXDOMAIN rates occur because most DGA domains are never registered. Tunneling tools use TXT records heavily because they can carry the most payload per query.\nRun field discovery first to confirm your exact Splunk Stream DNS field names \u0026ndash; they vary by Stream version:\nindex=stream sourcetype=\u0026#34;stream:dns\u0026#34; earliest=-1h | head 100 | fieldsummary | table field, count, distinct_count | sort -count Confirm: query, record_type, reply_code, src_ip. If fields are named differently (e.g., query_name instead of query), adjust the fit SPL accordingly.\nFIT \u0026ndash; aggregates per source IP over one-hour windows before training:\nindex=stream sourcetype=\u0026#34;stream:dns\u0026#34; earliest=-7d | eval query_len=len(query) | eval subdomain_len=len(mvindex(split(query,\u0026#34;.\u0026#34;),0)) | eval is_txt=if(match(record_type,\u0026#34;TXT\u0026#34;),1,0) | eval is_nxdomain=if(reply_code=\u0026#34;NXDOMAIN\u0026#34;,1,0) | bin _time span=1h | stats count as query_count, avg(query_len) as avg_query_len, avg(subdomain_len) as avg_subdomain_len, dc(query) as unique_domains, avg(is_txt) as txt_ratio, avg(is_nxdomain) as nxdomain_ratio, stdev(query_len) as query_len_stdev by src_ip, _time | fillnull value=0 | fit MLTKContainer algo=isolation_forest avg_query_len, avg_subdomain_len, query_count, unique_domains, txt_ratio, nxdomain_ratio, query_len_stdev into app:dns_tunnel_anomaly environment=2 APPLY (run every 5 minutes \u0026ndash; DNS events are fast-moving):\nindex=stream sourcetype=\u0026#34;stream:dns\u0026#34; earliest=-5m | eval query_len=len(query) | eval subdomain_len=len(mvindex(split(query,\u0026#34;.\u0026#34;),0)) | eval is_txt=if(match(record_type,\u0026#34;TXT\u0026#34;),1,0) | eval is_nxdomain=if(reply_code=\u0026#34;NXDOMAIN\u0026#34;,1,0) | bin _time span=5m | stats count as query_count, avg(query_len) as avg_query_len, avg(subdomain_len) as avg_subdomain_len, dc(query) as unique_domains, avg(is_txt) as txt_ratio, avg(is_nxdomain) as nxdomain_ratio, stdev(query_len) as query_len_stdev by src_ip, _time | fillnull value=0 | apply dns_tunnel_anomaly | where anomaly_score \u0026gt; 0.78 | table _time, src_ip, avg_query_len, avg_entropy, query_count, txt_ratio, nxdomain_ratio, anomaly_score, anomaly_label | sort -anomaly_score Step 4 \u0026ndash; Job 3: Windows Authentication Anomaly Detection Windows Security event logs contain rich behavioral data for authentication anomaly detection. The key EventCodes are 4624 (successful logon), 4625 (failed logon), 4648 (explicit credentials \u0026ndash; RunAs or pass-the-hash indicator), and 4672 (special privileges assigned \u0026ndash; admin logon indicator).\nThe power of this model is not in detecting individual events \u0026ndash; Splunk ES already does that with threshold rules. The power is in detecting behavioral patterns that emerge only in aggregate: a user whose fail-to-success ratio spikes at 2 AM, a service account that suddenly starts authenticating to hosts it has never touched, a privileged user who logs in from three different source IPs in a single hour.\nThe aggregation window is one hour per user. This deliberately hides individual events and surfaces behavioral patterns, which is what makes the model complementary to signature-based detection rather than redundant with it.\nFIT \u0026ndash; 14 days of data to capture enough baseline variation including weekends and off-hours patterns:\nindex=wineventlog sourcetype=\u0026#34;WinEventLog:Security\u0026#34; EventCode IN (4624,4625,4648,4672,4768,4769,4776) earliest=-14d NOT user IN (\u0026#34;SYSTEM\u0026#34;,\u0026#34;LOCAL SERVICE\u0026#34;,\u0026#34;NETWORK SERVICE\u0026#34;,\u0026#34;*$\u0026#34;) NOT user=\u0026#34;\u0026#34; | eval is_success=if(EventCode=4624,1,0) | eval is_failure=if(EventCode=4625,1,0) | eval is_network=if(EventCode=4624 AND Logon_Type=\u0026#34;3\u0026#34;,1,0) | eval is_explicit=if(EventCode=4648,1,0) | eval is_priv=if(EventCode=4672,1,0) | eval is_kerb=if(EventCode IN (4768,4769),1,0) | eval is_ntlm=if(EventCode=4776,1,0) | eval hour=tonumber(strftime(_time,\u0026#34;%H\u0026#34;)) | eval offhours=if(hour\u0026gt;=22 OR hour\u0026lt;=6,1,0) | bin _time span=1h | stats sum(is_success) as success_logins, sum(is_failure) as failed_logins, sum(is_network) as network_logons, sum(is_explicit) as explicit_cred_count, sum(is_priv) as priv_logons, sum(is_kerb) as kerb_tickets, sum(is_ntlm) as ntlm_auths, sum(offhours) as offhours_events, dc(src_ip) as unique_src_hosts, dc(ComputerName) as unique_dest_hosts by user, _time | eval fail_ratio=round(failed_logins/(success_logins+1),4) | eval offhours_flag=if(offhours_events\u0026gt;0,1,0) | fillnull value=0 | fit MLTKContainer algo=isolation_forest failed_logins, success_logins, fail_ratio, network_logons, unique_src_hosts, unique_dest_hosts, explicit_cred_count, priv_logons, ntlm_auths, kerb_tickets, offhours_flag into app:auth_anomaly environment=2 APPLY (run every 30 minutes):\nindex=wineventlog sourcetype=\u0026#34;WinEventLog:Security\u0026#34; EventCode IN (4624,4625,4648,4672,4768,4769,4776) earliest=-30m NOT user IN (\u0026#34;SYSTEM\u0026#34;,\u0026#34;LOCAL SERVICE\u0026#34;,\u0026#34;NETWORK SERVICE\u0026#34;,\u0026#34;*$\u0026#34;) NOT user=\u0026#34;\u0026#34; | eval is_success=if(EventCode=4624,1,0) | eval is_failure=if(EventCode=4625,1,0) | eval is_network=if(EventCode=4624 AND Logon_Type=\u0026#34;3\u0026#34;,1,0) | eval is_explicit=if(EventCode=4648,1,0) | eval hour=tonumber(strftime(_time,\u0026#34;%H\u0026#34;)) | eval offhours=if(hour\u0026gt;=22 OR hour\u0026lt;=6,1,0) | stats sum(is_success) as success_logins, sum(is_failure) as failed_logins, sum(is_network) as network_logons, sum(is_explicit) as explicit_cred_count, sum(offhours) as offhours_events, dc(src_ip) as unique_src_hosts, dc(ComputerName) as unique_dest_hosts by user | eval fail_ratio=round(failed_logins/(success_logins+1),4) | eval offhours_flag=if(offhours_events\u0026gt;0,1,0) | fillnull value=0 | apply auth_anomaly | where anomaly_score \u0026gt; 0.75 | table user, failed_logins, success_logins, network_logons, unique_dest_hosts, offhours_flag, anomaly_score, anomaly_label | sort -anomaly_score Step 5 \u0026ndash; Tuning Thresholds and Managing False Positives After a week of operation, review the distribution of anomaly scores from each model:\nindex=ai_inference sourcetype=jetson_inference earliest=-7d | stats count, avg(anomaly_score), perc90(anomaly_score), max(anomaly_score) by host Use this to calibrate thresholds. A well-tuned model should have:\nLess than 5% of scored events above your correlation rule threshold The top-scoring events should represent genuinely unusual behavior on review Score distribution should be roughly stable day over day If too many events score above threshold, raise the threshold or retrain with a longer window (earliest=-30d) to give the model more baseline variation. If the score distribution shifts significantly week over week, retrain \u0026ndash; the model is drifting from your evolving environment.\nStep 6 \u0026ndash; Enriching ES Notable Events The three AI models are most valuable when their output is combined with existing ES context. A Zeek anomaly that coincides with a Suricata IDS alert on the same source IP is a very different priority than an isolated anomaly. Add correlation context to your notable events:\nindex=net_data sourcetype=\u0026#34;bro:conn:json\u0026#34; earliest=-15m latest=now | where isnotnull(duration) AND isnotnull(orig_bytes) | eval conn_state_enc=case(conn_state=\u0026#34;SF\u0026#34;,1,conn_state=\u0026#34;S0\u0026#34;,0,true(),7) | eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4) | eval log_duration=if(duration\u0026gt;0,log(duration),0) | eval log_orig_bytes=if(orig_bytes\u0026gt;0,log(orig_bytes+1),0) | eval log_resp_bytes=if(resp_bytes\u0026gt;0,log(resp_bytes+1),0) | apply zeek_net_anomaly | where anomaly_score \u0026gt; 0.80 | eval src=\u0026#34;id.orig_h\u0026#34;, dest=\u0026#34;id.resp_h\u0026#34;, dest_port=\u0026#34;id.resp_p\u0026#34; | lookup asset_lookup_by_str str AS src OUTPUT priority AS src_priority, category AS src_category | eval signature=\u0026#34;AI_NETWORK_BEHAVIOR_ANOMALY\u0026#34; | eval severity=case( anomaly_score\u0026gt;0.90 AND src_priority=\u0026#34;critical\u0026#34;, \u0026#34;critical\u0026#34;, anomaly_score\u0026gt;0.90, \u0026#34;high\u0026#34;, anomaly_score\u0026gt;0.80, \u0026#34;medium\u0026#34;, true(), \u0026#34;low\u0026#34;) | table _time, src, dest, dest_port, anomaly_score, src_priority, src_category, severity, signature Conclusion You have deployed a working edge AI detection pipeline that covers three distinct attack surfaces using real security data your environment already produces. The Isolation Forest models are behavioral baselines \u0026ndash; they capture what normal looks like in your specific environment and score deviations against that baseline. Unlike signature-based detection, they will catch novel variants of known techniques because they detect behavioral patterns rather than specific indicators.\nThe architecture is maintainable. Models are retrained with a single SPL fit command. Thresholds are adjusted in correlation rule SPL. New data sources are added by writing a new training job and adding a stanza to containers.conf. The entire pipeline runs on hardware that costs a fraction of cloud GPU inference services, keeps your security data on your network, and integrates naturally into the Splunk workflow your team already uses.\nWhat comes next depends on your environment. Node 4 is available for a fourth use case \u0026ndash; the natural candidate is Suricata alert scoring: combining Suricata rule metadata with the Zeek anomaly scores from the same source IP and time window to prioritize which IDS alerts represent true positives versus background noise. That build would follow exactly the same pattern as the three jobs in this post.\nThe source code, Dockerfile, configuration files, and complete installation guide for this series are available in the companion repository.\nPart 1: Architecture and Concepts Part 2: Building the DSDL-Native Inference Container on Jetson Nano Part 3: Wiring the Pipeline \u0026ndash; DSDL Configuration, HEC, and Splunk Integration\n","permalink":"https://telemetry-forge.t-security.org/posts/edge-ai-secops-part4/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eThe first three parts of this series built the infrastructure. This final post uses it. You now have four Jetson Nano inference containers running HTTPS, DSDL wired to Splunk, and HEC delivering scored events back to your indexers. In this post, you will train three Isolation Forest models on real security data and build the ES correlation rules that turn AI-scored anomalies into actionable notable events.\u003c/p\u003e\n\u003cp\u003eThe three detection use cases are chosen for complementary coverage across the kill chain. Zeek connection log anomaly detection catches behavioral outliers in network traffic \u0026ndash; the kinds of connections that do not match your environment\u0026rsquo;s normal patterns. DNS anomaly detection catches tunneling, command-and-control beaconing, and DGA callbacks that generate no network flow anomalies because they hide inside legitimate DNS traffic. Windows authentication anomaly detection catches credential abuse, brute force, and lateral movement patterns that look like legitimate events when examined individually but are anomalous in aggregate.\u003c/p\u003e","title":"Real Security Data: Training and Deploying Anomaly Detection Models (Part 4 of 4)"},{"content":"Ted Skinner Security architect and SOC engineer with a focus on detection engineering, security telemetry, and operationalizing machine learning in security operations.\nThe Telemetry Forge documents builds done in a real security lab \u0026ndash; not cloud demos, not toy datasets. Every post is written at the depth a senior security engineer needs to actually replicate the work, including what broke and why.\nLab Environment Splunk Enterprise 10.0 with Enterprise Security 4-node NVIDIA Jetson Nano cluster (JetPack 4.6.6) Zeek IDS, Suricata, Splunk Stream Windows event log collection Custom edge AI inference pipeline (DSDL 5.2.3) Topics Covered Detection engineering and SIEM architecture Edge AI and machine learning for security operations Security telemetry pipeline design Threat hunting with Splunk SPL SOC automation and workflow engineering Contact Reach out via LinkedIn or GitHub.\n","permalink":"https://telemetry-forge.t-security.org/about/","summary":"\u003ch2 id=\"ted-skinner\"\u003eTed Skinner\u003c/h2\u003e\n\u003cp\u003eSecurity architect and SOC engineer with a focus on detection engineering, security telemetry, and operationalizing machine learning in security operations.\u003c/p\u003e\n\u003cp\u003eThe Telemetry Forge documents builds done in a real security lab \u0026ndash; not cloud demos, not toy datasets. Every post is written at the depth a senior security engineer needs to actually replicate the work, including what broke and why.\u003c/p\u003e\n\u003ch2 id=\"lab-environment\"\u003eLab Environment\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eSplunk Enterprise 10.0 with Enterprise Security\u003c/li\u003e\n\u003cli\u003e4-node NVIDIA Jetson Nano cluster (JetPack 4.6.6)\u003c/li\u003e\n\u003cli\u003eZeek IDS, Suricata, Splunk Stream\u003c/li\u003e\n\u003cli\u003eWindows event log collection\u003c/li\u003e\n\u003cli\u003eCustom edge AI inference pipeline (DSDL 5.2.3)\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2 id=\"topics-covered\"\u003eTopics Covered\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eDetection engineering and SIEM architecture\u003c/li\u003e\n\u003cli\u003eEdge AI and machine learning for security operations\u003c/li\u003e\n\u003cli\u003eSecurity telemetry pipeline design\u003c/li\u003e\n\u003cli\u003eThreat hunting with Splunk SPL\u003c/li\u003e\n\u003cli\u003eSOC automation and workflow engineering\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2 id=\"contact\"\u003eContact\u003c/h2\u003e\n\u003cp\u003eReach out via \u003ca href=\"https://www.linkedin.com/in/tedskinnercissp/\"\n   \n    target=\"_blank\" rel=\"noopener noreferrer\"\u003eLinkedIn\u003c/a\u003e or \u003ca href=\"https://github.com/tskinnerarlo\"\n   \n    target=\"_blank\" rel=\"noopener noreferrer\"\u003eGitHub\u003c/a\u003e.\u003c/p\u003e","title":"About"}]