From Kubernetes Dreams to Docker Reality: Building an ML Inference Cluster on Jetson Nano

I set out with a clear goal: stand up a four-node Jetson Nano cluster running k3s, connect it to Splunk’s Deep Learning Toolkit (DLTK), and use the Nano GPUs to serve ML model inference for security analytics use cases — specifically DNS tunneling detection and lateral movement identification from host-based firewall data.

The plan was reasonable on paper. Kubernetes gives you scheduling, self-healing, and a clean abstraction over bare hardware. Jetson Nanos are purpose-built for edge AI workloads. Splunk DLTK provides a framework for connecting ML models to security event streams. What could go wrong?

Quite a bit, as it turns out. This post walks through what I discovered, why the kernel mattered more than I expected, and how I landed on a cleaner architecture that actually works.

The Original Vision: k3s on Jetson Nano

The initial plan was straightforward:

Four Jetson Nano Dev Kits running off SD cards, organized as a k3s cluster
One control node managing three worker nodes
ML inference containers deployed across the cluster via standard Kubernetes workloads
Splunk on a separate, more capable host calling the cluster endpoints for model scoring

The pitch was appealing. Kubernetes lets you treat your inference nodes as a pool of resources rather than individual machines. You get load balancing, rolling updates, and the ability to schedule different models across nodes without manual coordination. For a home lab with security research ambitions, it felt like the right architectural foundation to build on.

The Jetson Nano was marketed for exactly this kind of workload — edge AI, containerized inference, IoT and robotics applications. It was purchased with those use cases in mind, roughly a year and a half before I started this project. Nothing about the original pitch was misleading.

But Kubernetes has evolved quickly, and the gap between what the Nano’s kernel can support and what modern K8s networking stacks expect turned out to be the central problem.

Hitting the Wall: Kernel 4.9 and Modern Kubernetes

The first sign of trouble came during networking troubleshooting — missing kernel modules, iptables/nftables conflicts, and incompatibilities that kept resurfacing.

Running the diagnostic commands on the control node told the full story:

cat /etc/nv_tegra_release
# R32 (release), REVISION: 7.6, GCID: 38171779, BOARD: t210ref, EABI: aarch64

uname -r
# 4.9.337-tegra

python3 --version
# Python 3.6.9

free -h
# Mem: 3.9G total   2.2G used   491M free
# Swap: 1.9G total   132M used

The Jetson Nano Dev Kit runs L4T R32.7.6 — NVIDIA’s Linux for Tegra release tied to JetPack 4.6.x. That means:

Ubuntu 18.04
Python 3.6
Kernel 4.9.337-tegra

Kernel 4.9 is from 2016. Modern Kubernetes networking stacks have moved well past it.

What Kernel 4.9 Cannot Do

The specific failures I ran into trace directly to kernel capability gaps:

Feature	Required Kernel	Nano’s Kernel
eBPF (Cilium)	5.4+	4.9 ❌
Full nftables support	5.x	4.9 ❌
kube-proxy replacement	5.4+	4.9 ❌
Modern netfilter modules	5.x	4.9 ❌
`xt_nfacct` and related modules	varies	missing ❌

Every time I pushed toward a more capable networking configuration — Cilium, nftables mode, kube-proxy replacement — I hit the same wall: the kernel didn’t have the primitives those components expected.

Can You Upgrade the Kernel?

This was the natural next question. The answer is: not to 5.15, not officially, not without breaking what makes a Jetson a Jetson.

How Jetson Kernels Work

On a standard PC or ARM SBC, you can often swap kernels independently of the OS. On Jetson, the kernel is tightly coupled to the L4T userspace stack:

The kernel includes NVIDIA’s custom GPU drivers
Device Tree Blobs (DTBs) are module-specific and tied to the L4T release
Firmware blobs, camera pipeline (CSI), and hardware acceleration (NVENC/NVDEC) all depend on the kernel version
The bootloader is part of the same release bundle

Installing a generic Ubuntu 5.15 kernel on a Nano will, at best, fail to boot. At worst, you get a system that boots but has no GPU, no camera, broken networking, or device tree mismatches you’ll spend days debugging.

The Upgrade Path Table

Module	Max JetPack	Kernel	Path to 5.15
Jetson Nano (Dev Kit)	JetPack 4.x	4.9	Not officially supported
Jetson Xavier NX	JetPack 5 / 6*	5.10 / 5.15*	Possible on select SKUs
Jetson Orin Nano	JetPack 6	5.15	Fully supported
Jetson Orin NX	JetPack 6	5.15	Fully supported

The Nano’s SoC is the Tegra X1 (t210 reference board). NVIDIA’s last official support line for that chip is JetPack 4.x. There is no supported upgrade path to kernel 5.15 that preserves the NVIDIA acceleration stack.

You can build a mainline kernel for Nano. People have done it. But you lose CUDA, the camera pipeline, and hardware acceleration — exactly the features I bought a Jetson for.

The Honest Answer for My Lab

For a k3s cluster where the primary value is GPU-accelerated inference, building a mainline kernel to chase Cilium compatibility would be a self-defeating exercise. I’d have four ARM64 nodes with no GPU acceleration running a networking stack that still might have other compatibility issues.

What k3s on Nano Actually Supports

Once I accepted the kernel constraint, the picture became clearer. k3s does work on Nano — it just needs to be configured conservatively.

What works on kernel 4.9:

k3s with default flannel CNI
Calico with tuning
iptables-legacy mode
Standard Kubernetes workloads at modest scale

What doesn’t work on kernel 4.9:

Cilium (requires 5.4+ for eBPF)
kube-proxy replacement mode
Advanced nftables configurations
Some Kubernetes 1.29+ networking features

The fix for the iptables issues I was seeing was to force legacy mode:

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy

This works. But it also highlights the core architectural question: if I’re running a stripped-down k3s configuration that avoids most of what makes modern Kubernetes networking interesting, what exactly is Kubernetes buying me here?

Rethinking the Architecture: What Am I Actually Solving?

This is where the troubleshooting process forced a useful question. I stepped back and asked what Kubernetes was actually supposed to do for this use case.

The original reasoning:

Scheduling: distribute inference workloads across nodes
Self-healing: restart failed containers automatically
Load balancing: spread requests across available nodes
Rolling updates: deploy new model versions without downtime

For a dynamic, general-purpose cluster, these are legitimate reasons to run Kubernetes. But my use case is different. I’m not running a general workload — I’m running fixed-function inference endpoints. Each node serves one or two models. The “scheduling” decision is essentially static. The “load balancing” is a round-robin list of four IP addresses. The “rolling updates” could be handled by restarting a Docker container.

Kubernetes adds real overhead:

etcd for cluster state
kubelet on every node
kube-proxy and CNI plugins
Overlay networking (flannel or equivalent)
The control plane itself consuming RAM and CPU on a 4GB device already running at 2.2GB used

On hardware this constrained, that overhead is not free. And on kernel 4.9, most of the advanced features that make the overhead worthwhile aren’t available anyway.

The Decision: Docker-Only, Fixed-Function Nodes

The conclusion was straightforward once I framed it correctly:

For fixed-function ML inference on constrained hardware, Docker-only is more stable, more efficient, and easier to reason about than Kubernetes.

The new architecture:

All four Nanos treated as equal inference workers — no control plane, no worker distinction
One Docker container per node running a stateless scoring API
Splunk host handles orchestration, feature extraction, and result write-back
Round-robin load balancing from the Splunk custom search command — a list of four IPs

This isn’t a compromise. For this specific use case, it’s the right architecture.

The Migration Path: k3s Out, Docker In

Step 1: Remove k3s From All Nodes

Run on every node (control and workers):

sudo /usr/local/bin/k3s-uninstall.sh 2>/dev/null || true
sudo /usr/local/bin/k3s-agent-uninstall.sh 2>/dev/null || true

# Clean residuals
sudo rm -rf /etc/rancher /var/lib/rancher /var/lib/kubelet /etc/cni /var/lib/cni
sudo ip link delete cni0 2>/dev/null || true
sudo ip link delete flannel.1 2>/dev/null || true

sudo reboot

Step 2: Install Docker

On each node:

sudo apt update
sudo apt install -y docker.io
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER

Verify with docker run hello-world after logging back in.

Step 3: Optimize Memory

With 2.2GB already in use at idle, headroom matters for model loading.

Disable the GUI to recover RAM:

sudo systemctl set-default multi-user.target
sudo reboot

Expand swap (helps prevent OOM during model load):

sudo swapoff -a
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

The Inference Service Design

Each Nano runs a lightweight stateless HTTP service. The initial version uses heuristic rules while real ONNX models are being trained and validated.

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/health")
def health():
    return {"status": "ok"}

@app.route("/score", methods=["POST"])
def score():
    data = request.json
    features = data.get("features", {})
    score = 0.0
    if features.get("uniq_qnames", 0) > 100:
        score += 0.5
    if features.get("avg_qlen", 0) > 60:
        score += 0.3
    if features.get("avg_pct_base32", 0) > 50:
        score += 0.2
    return jsonify({"score": score})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

The API contract is intentionally simple and model-agnostic:

// Request
{
  "model": "dns_tunnel_v1",
  "entity": {"src_ip": "10.1.2.3"},
  "features": {
    "qpm": 120,
    "uniq_qnames": 118,
    "avg_qlen": 74.2,
    "avg_pct_base32": 62.1
  }
}

// Response
{
  "model": "dns_tunnel_v1",
  "score": 0.93,
  "label": "suspicious",
  "reason_codes": ["high_unique_qnames", "long_qnames", "high_entropy_like"]
}

This contract stays stable whether the backend is a heuristic function, an ONNX model, or a TensorRT-optimized pipeline. Splunk doesn’t care what’s behind the endpoint.

The Splunk Integration

The goal of this lab work is for me to learn about learning and running Splunk’s Deep learning toolkit locally in my lab and to gain some experience working with the supporting technologies. I’ll be using the Splunk app to configure which nano endpoints to target by setting up target “environments” in the app which will direct which nanos to use for its fit and apply commands.

Why This Architecture Is Actually Better

Looking back, the Kubernetes path would have worked — but at significant ongoing cost.

Running k3s on kernel 4.9 with legacy iptables and no Cilium means accepting a second-class networking experience on hardware that’s already constrained. Every Kubernetes upgrade brings risk of new incompatibilities. Every new CNI feature has to be evaluated against what kernel 4.9 actually supports. The control plane itself consumes resources on a device with 4GB of shared CPU/GPU RAM.

The Docker-only approach:

No overlay networking — just Docker’s default bridge and host-reachable ports
No etcd, kubelet, or kube-proxy — resources go to inference, not cluster management
No kernel module compatibility surface — one less category of things that can break
Simpler mental model — four IP addresses, four Docker containers, round-robin from Splunk
Easier to upgrade — docker pull, docker stop, docker run is the entire deployment workflow

When Kubernetes genuinely makes sense for this cluster (autoscaling, dynamic model scheduling, multi-tenant workloads), the right move is to upgrade to hardware that supports it properly — Orin Nano or Orin NX modules running JetPack 6 and kernel 5.15. Not to fight kernel 4.9’s limitations indefinitely.

What’s Next

The immediate roadmap:

Build the feature tables in Splunk — DNS query features and host firewall features stored to a summary index
Validate the end-to-end loop with the heuristic scorer before introducing real models
Export real models to ONNX on the Splunk host (IsolationForest for anomaly detection, XGBoost for classification once labeled data is available)
Deploy ONNX Runtime (JetPack 4.6.x Jetson build) on each Nano and replace the heuristic
Add production hardening — retries, node health checks, FastAPI instead of Flask, systemd auto-start

The use cases I’m targeting first:

DNS tunneling detection using query entropy, subdomain depth, character class ratios, and behavioral volume features per source IP
Lateral movement detection from host firewall logs using destination fan-out, admin protocol port hits, and east-west traffic patterns

Both map cleanly to the feature-extraction-in-Splunk → score-on-Nano → write-back-to-Splunk pattern. The Nanos are well within their operating envelope for tabular anomaly detection models at this scale.

Lessons Learned

On hardware selection: Jetson Nano is capable edge AI hardware. But “edge AI” and “modern Kubernetes networking” are not the same requirement. Know which one you’re actually solving for before choosing your platform.

On kernel constraints: On Jetson, the kernel is not independently upgradeable. The L4T release is an integrated stack. If your use case requires a specific kernel version, that requirement flows upward to hardware selection.

On Kubernetes: K8s is the right answer for many distributed workload problems. It is not automatically the right answer for fixed-function, stateless inference endpoints on constrained hardware. Match the tool to the problem.

On architecture evolution: Starting with heuristic rules and a clean API contract lets you validate the end-to-end flow before committing to a specific model framework. The Nano doesn’t care whether it’s running a Flask heuristic or an ONNX model — and neither does Splunk.

This post is part of the TelemetryForge series on edge security analytics. Follow along as I build out the DNS tunneling and lateral movement detection use cases in future posts.

From Kubernetes Dreams to Docker Reality: Building an ML Inference Cluster on Jetson Nano#

The Original Vision: k3s on Jetson Nano#

Hitting the Wall: Kernel 4.9 and Modern Kubernetes#

What Kernel 4.9 Cannot Do#

Can You Upgrade the Kernel?#

How Jetson Kernels Work#

The Upgrade Path Table#

The Honest Answer for My Lab#

What k3s on Nano Actually Supports#

Rethinking the Architecture: What Am I Actually Solving?#

The Decision: Docker-Only, Fixed-Function Nodes#

The Migration Path: k3s Out, Docker In#

Step 1: Remove k3s From All Nodes#

Step 2: Install Docker#

Step 3: Optimize Memory#

The Inference Service Design#

The Splunk Integration#

Why This Architecture Is Actually Better#

What’s Next#

Lessons Learned#