From Kubernetes Dreams to Docker Reality: Building an ML Inference Cluster on Jetson Nano
I set out with a clear goal: stand up a four-node Jetson Nano cluster running k3s, connect it to Splunk’s Deep Learning Toolkit (DLTK), and use the Nano GPUs to serve ML model inference for security analytics use cases — specifically DNS tunneling detection and lateral movement identification from host-based firewall data.
The plan was reasonable on paper. Kubernetes gives you scheduling, self-healing, and a clean abstraction over bare hardware. Jetson Nanos are purpose-built for edge AI workloads. Splunk DLTK provides a framework for connecting ML models to security event streams. What could go wrong?
Quite a bit, as it turns out. This post walks through what I discovered, why the kernel mattered more than I expected, and how I landed on a cleaner architecture that actually works.
The Original Vision: k3s on Jetson Nano
The initial plan was straightforward:
- Four Jetson Nano Dev Kits running off SD cards, organized as a k3s cluster
- One control node managing three worker nodes
- ML inference containers deployed across the cluster via standard Kubernetes workloads
- Splunk on a separate, more capable host calling the cluster endpoints for model scoring
The pitch was appealing. Kubernetes lets you treat your inference nodes as a pool of resources rather than individual machines. You get load balancing, rolling updates, and the ability to schedule different models across nodes without manual coordination. For a home lab with security research ambitions, it felt like the right architectural foundation to build on.
The Jetson Nano was marketed for exactly this kind of workload — edge AI, containerized inference, IoT and robotics applications. It was purchased with those use cases in mind, roughly a year and a half before I started this project. Nothing about the original pitch was misleading.
But Kubernetes has evolved quickly, and the gap between what the Nano’s kernel can support and what modern K8s networking stacks expect turned out to be the central problem.
Hitting the Wall: Kernel 4.9 and Modern Kubernetes
The first sign of trouble came during networking troubleshooting — missing kernel modules, iptables/nftables conflicts, and incompatibilities that kept resurfacing.
Running the diagnostic commands on the control node told the full story:
cat /etc/nv_tegra_release
# R32 (release), REVISION: 7.6, GCID: 38171779, BOARD: t210ref, EABI: aarch64
uname -r
# 4.9.337-tegra
python3 --version
# Python 3.6.9
free -h
# Mem: 3.9G total 2.2G used 491M free
# Swap: 1.9G total 132M used
The Jetson Nano Dev Kit runs L4T R32.7.6 — NVIDIA’s Linux for Tegra release tied to JetPack 4.6.x. That means:
- Ubuntu 18.04
- Python 3.6
- Kernel 4.9.337-tegra
Kernel 4.9 is from 2016. Modern Kubernetes networking stacks have moved well past it.
What Kernel 4.9 Cannot Do
The specific failures I ran into trace directly to kernel capability gaps:
| Feature | Required Kernel | Nano’s Kernel |
|---|---|---|
| eBPF (Cilium) | 5.4+ | 4.9 ❌ |
| Full nftables support | 5.x | 4.9 ❌ |
| kube-proxy replacement | 5.4+ | 4.9 ❌ |
| Modern netfilter modules | 5.x | 4.9 ❌ |
xt_nfacct and related modules | varies | missing ❌ |
Every time I pushed toward a more capable networking configuration — Cilium, nftables mode, kube-proxy replacement — I hit the same wall: the kernel didn’t have the primitives those components expected.
Can You Upgrade the Kernel?
This was the natural next question. The answer is: not to 5.15, not officially, not without breaking what makes a Jetson a Jetson.
How Jetson Kernels Work
On a standard PC or ARM SBC, you can often swap kernels independently of the OS. On Jetson, the kernel is tightly coupled to the L4T userspace stack:
- The kernel includes NVIDIA’s custom GPU drivers
- Device Tree Blobs (DTBs) are module-specific and tied to the L4T release
- Firmware blobs, camera pipeline (CSI), and hardware acceleration (NVENC/NVDEC) all depend on the kernel version
- The bootloader is part of the same release bundle
Installing a generic Ubuntu 5.15 kernel on a Nano will, at best, fail to boot. At worst, you get a system that boots but has no GPU, no camera, broken networking, or device tree mismatches you’ll spend days debugging.
The Upgrade Path Table
| Module | Max JetPack | Kernel | Path to 5.15 |
|---|---|---|---|
| Jetson Nano (Dev Kit) | JetPack 4.x | 4.9 | Not officially supported |
| Jetson Xavier NX | JetPack 5 / 6* | 5.10 / 5.15* | Possible on select SKUs |
| Jetson Orin Nano | JetPack 6 | 5.15 | Fully supported |
| Jetson Orin NX | JetPack 6 | 5.15 | Fully supported |
The Nano’s SoC is the Tegra X1 (t210 reference board). NVIDIA’s last official support line for that chip is JetPack 4.x. There is no supported upgrade path to kernel 5.15 that preserves the NVIDIA acceleration stack.
You can build a mainline kernel for Nano. People have done it. But you lose CUDA, the camera pipeline, and hardware acceleration — exactly the features I bought a Jetson for.
The Honest Answer for My Lab
For a k3s cluster where the primary value is GPU-accelerated inference, building a mainline kernel to chase Cilium compatibility would be a self-defeating exercise. I’d have four ARM64 nodes with no GPU acceleration running a networking stack that still might have other compatibility issues.
What k3s on Nano Actually Supports
Once I accepted the kernel constraint, the picture became clearer. k3s does work on Nano — it just needs to be configured conservatively.
What works on kernel 4.9:
- k3s with default flannel CNI
- Calico with tuning
- iptables-legacy mode
- Standard Kubernetes workloads at modest scale
What doesn’t work on kernel 4.9:
- Cilium (requires 5.4+ for eBPF)
- kube-proxy replacement mode
- Advanced nftables configurations
- Some Kubernetes 1.29+ networking features
The fix for the iptables issues I was seeing was to force legacy mode:
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
This works. But it also highlights the core architectural question: if I’re running a stripped-down k3s configuration that avoids most of what makes modern Kubernetes networking interesting, what exactly is Kubernetes buying me here?
Rethinking the Architecture: What Am I Actually Solving?
This is where the troubleshooting process forced a useful question. I stepped back and asked what Kubernetes was actually supposed to do for this use case.
The original reasoning:
- Scheduling: distribute inference workloads across nodes
- Self-healing: restart failed containers automatically
- Load balancing: spread requests across available nodes
- Rolling updates: deploy new model versions without downtime
For a dynamic, general-purpose cluster, these are legitimate reasons to run Kubernetes. But my use case is different. I’m not running a general workload — I’m running fixed-function inference endpoints. Each node serves one or two models. The “scheduling” decision is essentially static. The “load balancing” is a round-robin list of four IP addresses. The “rolling updates” could be handled by restarting a Docker container.
Kubernetes adds real overhead:
- etcd for cluster state
- kubelet on every node
- kube-proxy and CNI plugins
- Overlay networking (flannel or equivalent)
- The control plane itself consuming RAM and CPU on a 4GB device already running at 2.2GB used
On hardware this constrained, that overhead is not free. And on kernel 4.9, most of the advanced features that make the overhead worthwhile aren’t available anyway.
The Decision: Docker-Only, Fixed-Function Nodes
The conclusion was straightforward once I framed it correctly:
For fixed-function ML inference on constrained hardware, Docker-only is more stable, more efficient, and easier to reason about than Kubernetes.
The new architecture:
- All four Nanos treated as equal inference workers — no control plane, no worker distinction
- One Docker container per node running a stateless scoring API
- Splunk host handles orchestration, feature extraction, and result write-back
- Round-robin load balancing from the Splunk custom search command — a list of four IPs
This isn’t a compromise. For this specific use case, it’s the right architecture.
The Migration Path: k3s Out, Docker In
Step 1: Remove k3s From All Nodes
Run on every node (control and workers):
sudo /usr/local/bin/k3s-uninstall.sh 2>/dev/null || true
sudo /usr/local/bin/k3s-agent-uninstall.sh 2>/dev/null || true
# Clean residuals
sudo rm -rf /etc/rancher /var/lib/rancher /var/lib/kubelet /etc/cni /var/lib/cni
sudo ip link delete cni0 2>/dev/null || true
sudo ip link delete flannel.1 2>/dev/null || true
sudo reboot
Step 2: Install Docker
On each node:
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER
Verify with docker run hello-world after logging back in.
Step 3: Optimize Memory
With 2.2GB already in use at idle, headroom matters for model loading.
Disable the GUI to recover RAM:
sudo systemctl set-default multi-user.target
sudo reboot
Expand swap (helps prevent OOM during model load):
sudo swapoff -a
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
The Inference Service Design
Each Nano runs a lightweight stateless HTTP service. The initial version uses heuristic rules while real ONNX models are being trained and validated.
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/health")
def health():
return {"status": "ok"}
@app.route("/score", methods=["POST"])
def score():
data = request.json
features = data.get("features", {})
score = 0.0
if features.get("uniq_qnames", 0) > 100:
score += 0.5
if features.get("avg_qlen", 0) > 60:
score += 0.3
if features.get("avg_pct_base32", 0) > 50:
score += 0.2
return jsonify({"score": score})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8000)
The API contract is intentionally simple and model-agnostic:
// Request
{
"model": "dns_tunnel_v1",
"entity": {"src_ip": "10.1.2.3"},
"features": {
"qpm": 120,
"uniq_qnames": 118,
"avg_qlen": 74.2,
"avg_pct_base32": 62.1
}
}
// Response
{
"model": "dns_tunnel_v1",
"score": 0.93,
"label": "suspicious",
"reason_codes": ["high_unique_qnames", "long_qnames", "high_entropy_like"]
}
This contract stays stable whether the backend is a heuristic function, an ONNX model, or a TensorRT-optimized pipeline. Splunk doesn’t care what’s behind the endpoint.
The Splunk Integration
The goal of this lab work is for me to learn about learning and running Splunk’s Deep learning toolkit locally in my lab and to gain some experience working with the supporting technologies. I’ll be using the Splunk app to configure which nano endpoints to target by setting up target “environments” in the app which will direct which nanos to use for its fit and apply commands.
Why This Architecture Is Actually Better
Looking back, the Kubernetes path would have worked — but at significant ongoing cost.
Running k3s on kernel 4.9 with legacy iptables and no Cilium means accepting a second-class networking experience on hardware that’s already constrained. Every Kubernetes upgrade brings risk of new incompatibilities. Every new CNI feature has to be evaluated against what kernel 4.9 actually supports. The control plane itself consumes resources on a device with 4GB of shared CPU/GPU RAM.
The Docker-only approach:
- No overlay networking — just Docker’s default bridge and host-reachable ports
- No etcd, kubelet, or kube-proxy — resources go to inference, not cluster management
- No kernel module compatibility surface — one less category of things that can break
- Simpler mental model — four IP addresses, four Docker containers, round-robin from Splunk
- Easier to upgrade —
docker pull,docker stop,docker runis the entire deployment workflow
When Kubernetes genuinely makes sense for this cluster (autoscaling, dynamic model scheduling, multi-tenant workloads), the right move is to upgrade to hardware that supports it properly — Orin Nano or Orin NX modules running JetPack 6 and kernel 5.15. Not to fight kernel 4.9’s limitations indefinitely.
What’s Next
The immediate roadmap:
- Build the feature tables in Splunk — DNS query features and host firewall features stored to a summary index
- Validate the end-to-end loop with the heuristic scorer before introducing real models
- Export real models to ONNX on the Splunk host (IsolationForest for anomaly detection, XGBoost for classification once labeled data is available)
- Deploy ONNX Runtime (JetPack 4.6.x Jetson build) on each Nano and replace the heuristic
- Add production hardening — retries, node health checks, FastAPI instead of Flask, systemd auto-start
The use cases I’m targeting first:
- DNS tunneling detection using query entropy, subdomain depth, character class ratios, and behavioral volume features per source IP
- Lateral movement detection from host firewall logs using destination fan-out, admin protocol port hits, and east-west traffic patterns
Both map cleanly to the feature-extraction-in-Splunk → score-on-Nano → write-back-to-Splunk pattern. The Nanos are well within their operating envelope for tabular anomaly detection models at this scale.
Lessons Learned
On hardware selection: Jetson Nano is capable edge AI hardware. But “edge AI” and “modern Kubernetes networking” are not the same requirement. Know which one you’re actually solving for before choosing your platform.
On kernel constraints: On Jetson, the kernel is not independently upgradeable. The L4T release is an integrated stack. If your use case requires a specific kernel version, that requirement flows upward to hardware selection.
On Kubernetes: K8s is the right answer for many distributed workload problems. It is not automatically the right answer for fixed-function, stateless inference endpoints on constrained hardware. Match the tool to the problem.
On architecture evolution: Starting with heuristic rules and a clean API contract lets you validate the end-to-end flow before committing to a specific model framework. The Nano doesn’t care whether it’s running a Flask heuristic or an ONNX model — and neither does Splunk.
This post is part of the TelemetryForge series on edge security analytics. Follow along as I build out the DNS tunneling and lateral movement detection use cases in future posts.