Migration from Proxmox to Talos Linux

Why Leave Proxmox?

After running Proxmox for several years, I decided it was time for a fresh approach. Don’t get me wrong - Proxmox is excellent. But I wanted:

Immutable infrastructure - No SSH, no manual changes
Infrastructure as code - Everything versioned in Git
Modern orchestration - Kubernetes native
Unified platform - Containers and VMs together

The New Stack

Talos Linux

Talos is an OS designed specifically for Kubernetes. Key features:

API-driven - No SSH access, all config via API
Immutable - Fresh boot every time, no drift
Secure by default - Minimal attack surface
Kubernetes-native - Optimized for K8s workloads

KubeVirt for VMs

Instead of traditional hypervisors, KubeVirt extends Kubernetes to run VMs:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ubuntu-vm
spec:
  running: true
  template:
    spec:
      domain:
        devices:
          disks:
          - name: root
            disk:
              bus: virtio
        resources:
          requests:
            memory: 4Gi
            cpu: 2

VMs become Kubernetes objects. Start, stop, manage them with kubectl or our custom console.

Migration Process

1. Backup Everything

Critical step. I backed up:

VM disks and configs
Network configuration
DNS records
Service dependencies

2. Talos Installation

The Boot Challenge:

The first challenge was getting the R430 to boot from the Talos ISO instead of Proxmox. After mounting the ISO via iDRAC virtual media, the server kept booting to Proxmox’s GRUB menu.

Solution:

Wiped the Proxmox bootloader: dd if=/dev/zero of=/dev/sda bs=1M count=100
Reset USB status in iDRAC
Remapped ISO as CD/DVD (not removable disk)
Used F11 Boot Manager to explicitly select virtual media

Installation Process:

# Generate configuration
talosctl gen config tom-lab-cluster https://192.168.1.100:6443 \
  --output-dir ~/r430-migration/talos-config

# Apply configuration
talosctl apply-config --insecure --nodes 192.168.1.100 \
  --file ~/r430-migration/talos-config/controlplane.yaml

Disk Selection Issue:

Initially tried /dev/sda, but the virtual CD took that device. Switched to /dev/sdb:

machine:
  install:
    disk: /dev/sdb  # Not /dev/sda!

After installation, Talos reboots and Kubernetes starts initializing.

3. Bootstrap Kubernetes

The Bootstrap Process:

# Set talosconfig
export TALOSCONFIG=~/r430-migration/talos-config/talosconfig

# Bootstrap etcd (cluster state)
talosctl bootstrap --nodes 192.168.1.100

# Get kubeconfig (with retry logic)
talosctl -e 192.168.1.100 --nodes 192.168.1.100 kubeconfig --force

# Verify cluster
kubectl get nodes
# NAME              STATUS   ROLES           AGE   VERSION
# r430-k8s-master   Ready    control-plane  5m    v1.29.1

Common Issue: kubeconfig Not Working

If you see “connection refused”, the kubeconfig might not be set correctly:

# Force regenerate
talosctl -e 192.168.1.100 --nodes 192.168.1.100 kubeconfig --force

# Verify
kubectl get nodes

Single-node cluster ready in ~5 minutes. All control plane components run on the same node.

4. Storage Layer

First Attempt: Longhorn

Longhorn seemed perfect - distributed storage with snapshots and backups. But we hit a critical blocker:

# Longhorn requires iSCSI
talosctl -e 192.168.1.100 --nodes 192.168.1.100 read /usr/sbin/iscsiadm
# Error: no such file or directory

Talos Linux is immutable and minimal - it doesn’t include open-iscsi by default. Adding it would require custom extensions, adding complexity.

Solution: Local Path Provisioner

Switched to Local Path Provisioner - simpler and perfect for single-node:

# Install
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml

# Set as default
kubectl patch storageclass local-path \
  -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Verify
kubectl get storageclass
# NAME         PROVISIONER            RECLAIMPOLICY
# local-path   rancher.io/local-path  Delete

PodSecurity Fix:

The provisioner’s helper pods need privileged access:

kubectl label namespace local-path-storage \
  pod-security.kubernetes.io/enforce=privileged --overwrite

Works perfectly for a single-node lab. No replication needed, fast local storage.

5. Networking

MetalLB for LoadBalancer IPs:

# Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.3/config/manifests/metallb-native.yaml

# Configure IP pool (interactive in script)
# IP range: 192.168.1.200-192.168.1.220

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.220
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system
spec:
  ipAddressPools:
  - default-pool

Traefik as Ingress Controller:

Traefik deployed and exposed on 192.168.1.200. Later, we added Nginx Proxy Manager for easier domain management.

Nginx Proxy Manager:

After initial Traefik setup, we deployed NPM for better SSL certificate management and user-friendly proxy host configuration. NPM runs on 192.168.1.202 and handles all external-facing services.

6. KubeVirt

Installation:

# Set version
export KUBEVIRT_VERSION=v1.1.2

# Install operator
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-operator.yaml

# Wait for operator
kubectl wait --for=condition=ready pod -n kubevirt-system \
  -l kubevirt.io=virt-operator --timeout=120s

# Install KubeVirt
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-cr.yaml

# Wait for components
kubectl wait --for=condition=ready pod -n kubevirt-system \
  -l kubevirt.io=virt-handler --timeout=300s

Hardware Virtualization Check:

The installation script checks for VT-x/AMD-V support. Even if the script reports it’s not detected, KubeVirt will automatically use hardware virtualization if available:

# Verify CPU support
talosctl -e 192.168.1.100 --nodes 192.168.1.100 read /proc/cpuinfo | grep vmx
# Should show: vmx (Intel) or svm (AMD)

VMs now run alongside containers with near-native performance.

Results

What Works Great

Immutability - No SSH temptation, everything in Git
Unified management - kubectl for everything
Fast updates - Talos upgrades in minutes
Observability - Standard K8s monitoring tools

Challenges

Learning curve - Different paradigm from traditional hypervisors
Tooling - Custom console needed (KubeSphere issues)
Storage - Had to pivot from Longhorn to Local Path (iSCSI dependency)
Debugging - No SSH, must use talosctl for node access
Boot issues - Getting R430 to boot from ISO took several attempts
PodSecurity - Multiple components needed privileged namespace labels
Image architecture - Had to rebuild for AMD64 (Mac → R430)
TLS certificates - Kubelet serving CSRs needed manual approval
Local registry - Talos HTTP configuration for insecure registry

Performance

Dell R430 specs:

2x Intel Xeon (32 cores total)
128GB RAM
Hardware RAID

Kubernetes overhead is minimal. VMs run at near-native speed with KubeVirt.

Custom Console

Why Build Custom?

KubeSphere installation failed due to:

Outdated Helm chart URLs (404 errors)
Image pull errors
Complex dependencies
Overkill for single-node lab

Our Solution:

Built a lightweight custom console:

Backend: Go with k8s.io/client-go and dynamic client for KubeVirt
Frontend: React + TypeScript + TailwindCSS + Vite
Features:
- Real-time dashboard with cluster stats
- Node monitoring
- Pod listing and filtering
- VM management (start/stop/restart)
- Namespace filtering

Key Challenges Solved:

Architecture mismatch (ARM64 → AMD64)
KubeVirt client version conflicts
In-cluster config timing issues
Local registry HTTP configuration

Simple, fast, does exactly what we need. See Custom Console article for full details.

Conclusion

Worth it. The migration took a weekend, but I now have:

Infrastructure fully in Git
Modern cloud-native platform
Better security posture
Unified containers + VMs

Would I recommend it? If you:

Are comfortable with Kubernetes
Want infrastructure as code
Have time to learn new tools
Don’t need GUI for everything

Then yes, absolutely.

Next: Architecture Deep Dive