Command the Cluster — Master kubectl for Production Cover

268 Pages (PDF)

Free Companion Guide

Limited Time Launch Offer 268 Pages (PDF) · Production-Grade kubectl · Free Companion Guide

Command the Cluster — Master kubectl for Production

36 chapters of production-grade kubectl — from everyday commands to 3 AM break-glass survival. The jsonpath tricks, debugging workflows, and on-call judgment you won't find in a cheat sheet.

268 Pages (PDF)

Free Companion Guide

$6.34 $2.11

Special Launch Discount: 67% OFF

Secure 256-bit SSL checkout. Instant signed PDF download.

Not Another Cheat Sheet

This field manual gives you the judgment around the command.

Not Another Cheat Sheet

A free cheat sheet is just command → description. You can Google that. This field manual gives you the judgment around the command — when to reach for one over another, how to read the output that actually matters, and the gotchas you only learn on-call at 3 AM with a pager going off.

From Everyday Commands to Break-Glass

This manual spans the full range: basic kubectl fluency for juniors, all the way to recovering a cluster when the API server itself is down. Advanced jsonpath, ephemeral-container debugging, RBAC auditing, etcd operations, and the dangerous commands done safely.

What You'll Master

12 core production operational topics built directly into the manual.

JSONPATH MASTERY

Turn kubectl into a cluster-wide query engine.

CRASHLOOP DECODE

Read exit codes 137/143 and find the real cause.

DISTROLESS DEBUG

Debug a no-shell container with ephemeral containers.

DNS / NETWORK TRACE

Find the 5-second timeout buried in the kernel.

BREAK-GLASS

Recover a cluster when the API server is down.

RBAC AUDIT

Find who can exec into production, at scale.

SAFE FORCE DELETE

Force-delete a stuck pod without corrupting data.

ETCD OPERATIONS

Snapshot, defrag, and restore etcd safely.

FINALIZER REMOVAL

Clear a stuck namespace without orphaning resources.

NODE PRESSURE

Diagnose OOMKills and evictions at the node level.

FIRST 5 COMMANDS

The reflex sequence for every incident type.

KREW FIELD KIT

The plugins that actually matter in production.

What Makes This Different?

A cheat sheet shows you the command. This book shows you the judgment.

The Situation

"A pod is stuck Terminating. The fix on Stack Overflow is kubectl delete pod --force --grace-period=0."

❌ What most engineers do

Run the force-delete immediately. It clears the stuck pod — problem solved. Or so it looks.

🎯 What the manual teaches you to check first

Is the node actually dead, or just unreachable? Because --force deletes the pod record in the API server, not the container itself. If the node is alive, the container keeps running — and for a StatefulSet, a replacement spins up with the exact same identity, and now two pods are writing to the same disk simultaneously.

✅ What you learn to do instead

Confirm the node is genuinely dead (via node status & cloud console) before forcing. If it's alive, resolve why the pod won't terminate — a stuck preStop hook, a long grace period, a finalizer — rather than forcing past it.

→ That difference — knowing what a command actually overrides — is what separates a clean fix from a 2 AM data-corruption incident.

How Every Entry Is Built

What It Does

The command, in plain English. No padding.

When To Use It

The judgment: why this command, not the alternative.

Reading The Output

Real, trimmed output with a finger on exactly what matters.

Field Note

The gotcha you only learn on-call, with the safe alternative for dangerous commands.

Why This is Not a Generic Cheat Sheet

Judgment Over Syntax

Every high-value entry covers when to use it, how to read the output, and the on-call gotcha — not just what to type.

Junior to Senior

Everyday kubectl fluency through expert break-glass recovery. Every entry tagged so you know where the value is.

Dangerous Commands, Done Safely

Every destructive command is paired with its blast radius and a safe alternative. Force-delete and finalizer removal are never presented unguarded.

Gain access to all 36 chapters of production-grade kubectl and break-glass workflows

Explore the 36 Chapters

Click on the parts below to inspect the production-grade kubectl syllabus included in this field manual.

Chapter 1 kubectl Mechanics
Chapter 2 Context, Config & Productivity
Chapter 3 Everyday CRUD

Chapter 4 Pods & Containers
Chapter 5 Deployments, ReplicaSets, StatefulSets & DaemonSets
Chapter 6 Services, Endpoints & EndpointSlices
Chapter 7 Ingress & Ingress Controllers
Chapter 8 ConfigMaps & Secrets
Chapter 9 Storage: PV / PVC / StorageClass / CSI
Chapter 10 Nodes
Chapter 11 Namespaces, Jobs/CronJobs & Quotas
Chapter 12 RBAC & ServiceAccounts
Chapter 13 etcd & Control-Plane Objects

Chapter 14 Advanced Output Control & Query Mastery
Chapter 15 jq & yq Recipes
Chapter 16 Multi-Pod Logs & TUIs
Chapter 17 The krew Plugin Field Kit
Chapter 18 Node & Runtime CLIs (When kubectl Isn't Enough)

Chapter 19 Debugging a Crashing Pod (CrashLoopBackOff)
Chapter 20 Debugging Distroless / No-Shell Containers
Chapter 21 Trace a Network / DNS Issue
Chapter 22 Audit RBAC / Permissions
Chapter 23 Investigate Node Pressure / Eviction / OOM
Chapter 24 Recover a Stuck / Unschedulable / Pending Pod
Chapter 25 Find What's Consuming Resources
Chapter 26 Recover Storage
Chapter 27 Diagnose a Slow / Overloaded API Server
Chapter 28 Investigate a Bad Rollout / Deployment

Chapter 29 Control-Plane Health
Chapter 30 etcd Operations (The Safe Production Commands)

Chapter 31 When kubectl Returns Nothing

Chapter 32 Safe Force Operations
Chapter 33 What NOT to Do (Outage-Causers and Safe Alternatives)

Chapter 34 "First 5 Commands" Cards by Symptom
Chapter 35 Decoder Tables
Chapter 36 Copy-Paste jsonpath / jq One-Liner Appendix

Master advanced jsonpath query techniques, distroless container debugging, and quorum operations

High-Value Production Scenarios Covered

Whether you are a Kubernetes beginner or an experienced SRE, this manual bridges the gap between basic tutorials and complex live production operations. We don't just teach you syntax — we explain how systems break, what warning signs to monitor, and how to fix them safely.

jsonpath at Scale

Pull restart reasons, image inventories, and pods-without-limits across the whole cluster in one query. Plus the trait that bites everyone: jsonpath fails silently.

Decode the Distroless

No shell, no tools, still broken. Use ephemeral containers to attach a debug image and inspect a container that gives you nothing to work with.

The 5-Second DNS

Trace a DNS timeout down to the kernel conntrack race, and know why NodeLocal DNSCache is the real fix.

API Server Down

When kubectl returns nothing, work the node directly with crictl and etcdctl to bring the control plane back.

Exit Code Decoder

137 vs 143 vs 126 vs 127 — what each one means, the 128+N arithmetic, and where to look next.

Safe Force-Delete

Why --force is a delete in disguise, the StatefulSet corruption it causes, and how to do it without losing data.

Who is This Book For?

This guide is built for engineers looking to master advanced Kubernetes operations and on-call troubleshooting patterns.

The Junior Engineer

Build real kubectl fluency fast, and learn the judgment behind the commands before you're on-call alone.

The On-Call SRE

A reflex library for incidents: the first commands, the decoder tables, the break-glass moves when kubectl stops answering.

The Senior / Staff Engineer

The signature chapters — break-glass, dangerous commands, etcd, control-plane recovery — for when you're the one others escalate to.

The Interview Candidate

The perfect companion to the scenarios book: this is the what to type, that is the how to think.

What Value This Book Adds

Most cheat sheets stop at simple command lists. This book teaches you production SRE operational judgment and safety boundaries.

Production-Grade, Not minikube

Real debugging workflows, real break-glass recovery, the commands you run when something is actually on fire.
Every Dangerous Command, Made Safe

Blast-radius notes and safe alternatives on every destructive operation. Fix the incident without causing the next one.
Print-and-Pin Quick Reference

The "First 5 Commands" cards and decoder tables are built to stick on your monitor for the next page.

kubectl Cheat Sheet vs. On-Call Field Manual

$ k get nodes

NAME STATUS ROLES AGE VERSION

node-prod-01 NotReady worker 45d v1.30.0

# Traditional Cheat Sheet:

- kubectl get nodes (shows status, nothing else)

- kubectl describe node node-prod-01 (dump pages of logs)

# Field Manual Judgment (Learned in Book):

+ Run reflex "First 5 Commands" for Node pressure (Ch 34)

+ Determine if evicted (kubelet) or OOMKilled (cgroup) (Ch 23)

+ Execute safe member removal & defrag on etcd (Ch 30)

Designed by SREs for high-velocity platform engineers

Frequently Asked Questions

Have questions about the field manual? Find quick answers below.

It works for juniors and seniors both. Early chapters build everyday kubectl fluency; later chapters go to expert break-glass and control-plane recovery. If you can already run basic kubectl commands, you'll get value from page one.

A high-quality PDF, available instantly in your secure dashboard after payment.

No — they're companions. The scenarios book teaches you how to think through production failures. This field manual teaches you exactly what to type. Together they cover judgment and execution.

Every command is written and validated against Kubernetes 1.30, with version-sensitive commands flagged inline. Lifetime updates included.

Yes — all payments go through Razorpay (UPI, cards, net banking, wallets). Your financial information is encrypted and never stored on our servers.

Download the dynamic watermarked SRE handbook instantly

PREVIEW CHAPTER

Sample Scenario Sneak Peek

Take a look at how real production incidents are documented and resolved in the field manual.

SEV-1 CRITICAL

Field Manual Entry: Chapter 21 Preview

Page 168 RESOLVED

Every DNS Lookup Suddenly Takes 5+ Seconds

SYMPTOMS & IMPACT

Applications throw connection timeout alerts under high concurrent loads. The latency metrics spike to exactly 5000ms for service discovery lookups, but CoreDNS CPU usage remains completely normal.

DIAGNOSIS & CAUSE

1. Execute active network latency queries from within an application container to verify service discovery timings:

sh - app-pod

# Execute test lookup inside target application container

$ kubectl exec -it app-pod -- time nslookup kubernetes.default

Server: 10.96.0.10

Address: 10.96.0.10#53

Name: kubernetes.default.svc.cluster.local

Address: 10.96.0.1

real 0m 5.008s

user 0m 0.002s

sys 0m 0.005s

2. Root Cause: This is a connection tracking race condition in the Linux kernel netfilter conntrack module when performing parallel A and AAAA DNS lookups over UDP. Under load, the kernel NAT translation drops the duplicate insertion socket request, triggering a 5-second timeout resolver fallback.

RESOLUTION RUNBOOK

Deploy NodeLocal DNSCache as a DaemonSet to intercept UDP DNS queries locally, bypassing the kernel conntrack NAT table entirely.
Minimize SEARCH paths and decrease resolver retries by tuning pod configuration: set ndots: 1 in dnsConfig.

About the Author: Joydeep Mondal

Joydeep Mondal is a Senior SRE and platform engineer specializing in national-scale, citizen-facing government platforms operating 24x7 with no maintenance window. He builds resilient system boundaries and guides engineering organizations in resolving critical production incidents.