Skip to content
nerdz.cloud

RAMBLINGS// field log

Of ramblings & field notes.

Homelab, Star Citizen, 3D printing, TTRPGs — whatever I'm nerding out on. Infrequent, and at length.

RSS · /feed
showing 8 of 29

WORTH YOUR TIME

2026 7 LOGS
Three 990 PROs, One Batch, All Dying — Part 3: The Part Where the Canary Lied

The canary migration went perfectly, so I ran the same playbook on the last two nodes. They found five new ways to make me earn it — node-local data that vaporises on reinstall, an OSD that booted faster than its network, a password bug I'd only half-fixed, a restore that raced itself, and a serial number I wrongly swore I couldn't read.

Three 990 PROs, One Batch, All Dying — Part 2: The Replacement

Enterprise SSDs arrived, so I migrated a live Talos control plane onto them. First I had to fix the backups, then learn that swapping a boot disk on Talos isn't a swap at all — it's a rebuild. Plus the canary node that taught me five things I only half-believed.

Migrating Ceph off Thunderbolt: from Mesh to Switched 10G

I was wrong about LACP. Time to rewire the Ceph fabric. Part 1 of 2 — the plan.

Three 990 PROs, One Batch, All Dying — Part 1: The Slow Death

What happens when you put consumer NVMe under an etcd + Ceph mon workload. Part 1 of 3.

Deploying Open Source LLMs in a Homelab - Part 4

Ditching Ollama for LocalAI, battling P2P federation that doesn't work in Kubernetes, and building a self-hosted AI stack with persistent memory.

Cloud Provider Roulette: Finding a Home for Redroid

A journey through TrueNAS, Oracle Cloud, and Hetzner before finally landing on AWS Graviton for running Android containers with acceptable latency from New Zealand.

Running Game Servers from a NAS: Pterodactyl + TrueNAS

Deploying Pterodactyl Panel on Kubernetes with Wings running on TrueNAS for self-hosted game server management

2025 19 LOGS
CephFS Sparse File Corruption: A Data Recovery Story

How a CephFS sparse file handling quirk silently corrupted my app configs during VolSync restores—and the multi-day recovery effort across qbittorrent, sabnzbd, sonarr, radarr, and filebrowser using a mix of Kopia snapshots and old Restic backups.

When BGP Doesn't Fix Hairpin: Cilium DSR and the Same-Node Problem

BGP was supposed to fix my hairpin routing issues. It didn't. Here's how CoreDNS rewriting saved the day when pods couldn't reach LoadBalancer VIPs on the same node.

Upgrading Ceph from Reef to Tentacle in a Rook-Managed Cluster

A real-world walkthrough of upgrading Ceph from v18 (Reef) through v19 (Squid) to v20 (Tentacle) via GitOps—including the correction of my wrong assumptions about Rook version constraints.

pgBackRest: Multi-Destination PostgreSQL Backups in CloudNativePG

How I replaced Barman Cloud Plugin with pgBackRest to get true dual-destination full backups to both Backblaze B2 and Cloudflare R2, then migrated my entire PostgreSQL infrastructure to PostgreSQL 18.

Self-Hosting Kubernetes CRD Schemas

Why I deployed a self-hosted GitHub Actions runner and Cloudflare Pages to serve JSON schemas extracted from my cluster's CRDs, eliminating dependency on third-party schema hosts.

From L2 Announcements to BGP: Migrating Cilium LoadBalancer IPs

Why I moved from Cilium L2 announcements to BGP for LoadBalancer IP advertisement, and how a dedicated Services VLAN simplified everything.

Defragmenting etcd in a Talos Kubernetes Cluster

Why etcd fragments over time and how to reclaim disk space with talosctl etcd defrag.

Migrating Volsync from Restic to Kopia

How I migrated my Kubernetes PVC backups from Restic to Kopia with a 3-2-1 backup strategy: hourly NFS backups for fast restores, plus daily cloud backups to Backblaze B2 and Cloudflare R2 for disaster recovery.

Migrating Flux Kustomizations Out of flux-system

Why I moved every Flux Kustomization into its target namespace, the challenges with substituteFrom, and how strategic patching made it work.

Killing 23 Tailscale Proxies with Split DNS

How I replaced per-app Tailscale ingresses with a single Connector and Split DNS for same-URL-everywhere remote access

Running 14 MCP Servers in VS Code for Homelab Mastery

How I configured Model Context Protocol servers to give Claude Code superpowers over my Kubernetes cluster

Rebuilding My Talos Cluster from Bare Metal

What I broke, how I wiped everything, and the steps I'm using to bootstrap Talos + Flux again.

Tesla Integration with Home Assistant

Setting up the offical Tesla Fleet Addon for Home-Assistant with Kubernetes

Deploying Open Source LLMs in a Homelab - Part 3

Rolling out Ollama in Kubernetes with shared storage and Open-WebUI

Deploying Open Source LLMs in a Homelab - Part 2

Prerequisites and getting Open-WebUI up and running

Understanding AI: Generative AI, LLMs, ML & the Open vs Closed Source Debate

Breaking down the buzzwords and tech behind today's AI boom

QBitorrent Woes

All connections stopped

Migrating Database Clusters

When recovery goes bad

Car Care

Protecting your investment

2024 3 LOGS