Type to search the DevOpsManual references...

Press Esc to close
Command the Cluster — Master kubectl for Production Cover
268 Pages (PDF)
Free Companion Guide
Limited Time Launch Offer 268 Pages (PDF) · Production-Grade kubectl · Free Companion Guide

Command the Cluster — Master kubectl for Production

36 chapters of production-grade kubectl — from everyday commands to 3 AM break-glass survival. The jsonpath tricks, debugging workflows, and on-call judgment you won't find in a cheat sheet.

Command the Cluster — Master kubectl for Production Cover
268 Pages (PDF)
Free Companion Guide
$6.34 $2.11
Special Launch Discount: 67% OFF

Powered by Razorpay

Secure 256-bit SSL checkout. Instant signed PDF download.

Not Another Cheat Sheet

This field manual gives you the judgment around the command.

Not Another Cheat Sheet

A free cheat sheet is just command → description. You can Google that. This field manual gives you the judgment around the command — when to reach for one over another, how to read the output that actually matters, and the gotchas you only learn on-call at 3 AM with a pager going off.

From Everyday Commands to Break-Glass

This manual spans the full range: basic kubectl fluency for juniors, all the way to recovering a cluster when the API server itself is down. Advanced jsonpath, ephemeral-container debugging, RBAC auditing, etcd operations, and the dangerous commands done safely.

What You'll Master

12 core production operational topics built directly into the manual.

JSONPATH MASTERY

Turn kubectl into a cluster-wide query engine.

CRASHLOOP DECODE

Read exit codes 137/143 and find the real cause.

DISTROLESS DEBUG

Debug a no-shell container with ephemeral containers.

DNS / NETWORK TRACE

Find the 5-second timeout buried in the kernel.

BREAK-GLASS

Recover a cluster when the API server is down.

RBAC AUDIT

Find who can exec into production, at scale.

SAFE FORCE DELETE

Force-delete a stuck pod without corrupting data.

ETCD OPERATIONS

Snapshot, defrag, and restore etcd safely.

FINALIZER REMOVAL

Clear a stuck namespace without orphaning resources.

NODE PRESSURE

Diagnose OOMKills and evictions at the node level.

FIRST 5 COMMANDS

The reflex sequence for every incident type.

KREW FIELD KIT

The plugins that actually matter in production.

What Makes This Different?

A cheat sheet shows you the command. This book shows you the judgment.

The Situation
"A pod is stuck Terminating. The fix on Stack Overflow is kubectl delete pod --force --grace-period=0."
❌ What most engineers do
Run the force-delete immediately. It clears the stuck pod — problem solved. Or so it looks.
🎯 What the manual teaches you to check first
Is the node actually dead, or just unreachable? Because --force deletes the pod record in the API server, not the container itself. If the node is alive, the container keeps running — and for a StatefulSet, a replacement spins up with the exact same identity, and now two pods are writing to the same disk simultaneously.
✅ What you learn to do instead
Confirm the node is genuinely dead (via node status & cloud console) before forcing. If it's alive, resolve why the pod won't terminate — a stuck preStop hook, a long grace period, a finalizer — rather than forcing past it.
→ That difference — knowing what a command actually overrides — is what separates a clean fix from a 2 AM data-corruption incident.

How Every Entry Is Built

What It Does
The command, in plain English. No padding.
When To Use It
The judgment: why this command, not the alternative.
Reading The Output
Real, trimmed output with a finger on exactly what matters.
Field Note
The gotcha you only learn on-call, with the safe alternative for dangerous commands.

Why This is Not a Generic Cheat Sheet

Judgment Over Syntax

Every high-value entry covers when to use it, how to read the output, and the on-call gotcha — not just what to type.

Junior to Senior

Everyday kubectl fluency through expert break-glass recovery. Every entry tagged so you know where the value is.

Dangerous Commands, Done Safely

Every destructive command is paired with its blast radius and a safe alternative. Force-delete and finalizer removal are never presented unguarded.

Gain access to all 36 chapters of production-grade kubectl and break-glass workflows

Powered by Razorpay

Explore the 36 Chapters

Click on the parts below to inspect the production-grade kubectl syllabus included in this field manual.

  • Chapter 1 kubectl Mechanics
  • Chapter 2 Context, Config & Productivity
  • Chapter 3 Everyday CRUD

  • Chapter 4 Pods & Containers
  • Chapter 5 Deployments, ReplicaSets, StatefulSets & DaemonSets
  • Chapter 6 Services, Endpoints & EndpointSlices
  • Chapter 7 Ingress & Ingress Controllers
  • Chapter 8 ConfigMaps & Secrets
  • Chapter 9 Storage: PV / PVC / StorageClass / CSI
  • Chapter 10 Nodes
  • Chapter 11 Namespaces, Jobs/CronJobs & Quotas
  • Chapter 12 RBAC & ServiceAccounts
  • Chapter 13 etcd & Control-Plane Objects

  • Chapter 14 Advanced Output Control & Query Mastery
  • Chapter 15 jq & yq Recipes
  • Chapter 16 Multi-Pod Logs & TUIs
  • Chapter 17 The krew Plugin Field Kit
  • Chapter 18 Node & Runtime CLIs (When kubectl Isn't Enough)

  • Chapter 19 Debugging a Crashing Pod (CrashLoopBackOff)
  • Chapter 20 Debugging Distroless / No-Shell Containers
  • Chapter 21 Trace a Network / DNS Issue
  • Chapter 22 Audit RBAC / Permissions
  • Chapter 23 Investigate Node Pressure / Eviction / OOM
  • Chapter 24 Recover a Stuck / Unschedulable / Pending Pod
  • Chapter 25 Find What's Consuming Resources
  • Chapter 26 Recover Storage
  • Chapter 27 Diagnose a Slow / Overloaded API Server
  • Chapter 28 Investigate a Bad Rollout / Deployment

  • Chapter 29 Control-Plane Health
  • Chapter 30 etcd Operations (The Safe Production Commands)

  • Chapter 31 When kubectl Returns Nothing

  • Chapter 32 Safe Force Operations
  • Chapter 33 What NOT to Do (Outage-Causers and Safe Alternatives)

  • Chapter 34 "First 5 Commands" Cards by Symptom
  • Chapter 35 Decoder Tables
  • Chapter 36 Copy-Paste jsonpath / jq One-Liner Appendix

Master advanced jsonpath query techniques, distroless container debugging, and quorum operations

Powered by Razorpay

High-Value Production Scenarios Covered

Whether you are a Kubernetes beginner or an experienced SRE, this manual bridges the gap between basic tutorials and complex live production operations. We don't just teach you syntax — we explain how systems break, what warning signs to monitor, and how to fix them safely.

jsonpath at Scale
HV

Pull restart reasons, image inventories, and pods-without-limits across the whole cluster in one query. Plus the trait that bites everyone: jsonpath fails silently.

Decode the Distroless
HV

No shell, no tools, still broken. Use ephemeral containers to attach a debug image and inspect a container that gives you nothing to work with.

The 5-Second DNS
HV

Trace a DNS timeout down to the kernel conntrack race, and know why NodeLocal DNSCache is the real fix.

API Server Down
HV

When kubectl returns nothing, work the node directly with crictl and etcdctl to bring the control plane back.

Exit Code Decoder
HV

137 vs 143 vs 126 vs 127 — what each one means, the 128+N arithmetic, and where to look next.

Safe Force-Delete
HV

Why --force is a delete in disguise, the StatefulSet corruption it causes, and how to do it without losing data.

Finalizer Removal
HV

Why a namespace gets stuck Terminating, what the finalizer was protecting, and how to clear it without orphaning cloud resources.

etcd Operations
HV

Snapshot, status, defrag, and restore — the safe production commands, with the quorum math.

RBAC Audit
HV

Use auth can-i --as and who-can to find every identity that can exec into prod or escalate to cluster-admin.

Node Pressure & OOM
HV

Tell OOMKilled (cgroup) from eviction (kubelet) from preemption (scheduler) — the three-layer model most people confuse.

The "First 5 Commands"
HV

A fixed reflex sequence for each incident type, so you don't think under pressure — you reach.

The krew Field Kit
HV

neat, tree, who-can, sniff, df-pv, node-shell — the plugins that actually earn their place on-call.

Who is This Book For?

This guide is built for engineers looking to master advanced Kubernetes operations and on-call troubleshooting patterns.

The Junior Engineer

Build real kubectl fluency fast, and learn the judgment behind the commands before you're on-call alone.

The On-Call SRE

A reflex library for incidents: the first commands, the decoder tables, the break-glass moves when kubectl stops answering.

The Senior / Staff Engineer

The signature chapters — break-glass, dangerous commands, etcd, control-plane recovery — for when you're the one others escalate to.

The Interview Candidate

The perfect companion to the scenarios book: this is the what to type, that is the how to think.

What Value This Book Adds

Most cheat sheets stop at simple command lists. This book teaches you production SRE operational judgment and safety boundaries.

  • Production-Grade, Not minikube

    Real debugging workflows, real break-glass recovery, the commands you run when something is actually on fire.

  • Every Dangerous Command, Made Safe

    Blast-radius notes and safe alternatives on every destructive operation. Fix the incident without causing the next one.

  • Print-and-Pin Quick Reference

    The "First 5 Commands" cards and decoder tables are built to stick on your monitor for the next page.

kubectl Cheat Sheet vs. On-Call Field Manual

$ k get nodes
NAME STATUS ROLES AGE VERSION
node-prod-01 NotReady worker 45d v1.30.0
# Traditional Cheat Sheet:
- kubectl get nodes (shows status, nothing else)
- kubectl describe node node-prod-01 (dump pages of logs)
# Field Manual Judgment (Learned in Book):
+ Run reflex "First 5 Commands" for Node pressure (Ch 34)
+ Determine if evicted (kubelet) or OOMKilled (cgroup) (Ch 23)
+ Execute safe member removal & defrag on etcd (Ch 30)

Designed by SREs for high-velocity platform engineers

Powered by Razorpay

Frequently Asked Questions

Have questions about the field manual? Find quick answers below.

It works for juniors and seniors both. Early chapters build everyday kubectl fluency; later chapters go to expert break-glass and control-plane recovery. If you can already run basic kubectl commands, you'll get value from page one.

A high-quality PDF, available instantly in your secure dashboard after payment.

No — they're companions. The scenarios book teaches you how to think through production failures. This field manual teaches you exactly what to type. Together they cover judgment and execution.

Every command is written and validated against Kubernetes 1.30, with version-sensitive commands flagged inline. Lifetime updates included.

Yes — all payments go through Razorpay (UPI, cards, net banking, wallets). Your financial information is encrypted and never stored on our servers.

Download the dynamic watermarked SRE handbook instantly

Powered by Razorpay

PREVIEW CHAPTER

Sample Scenario Sneak Peek

Take a look at how real production incidents are documented and resolved in the field manual.

SEV-1 CRITICAL

Field Manual Entry: Chapter 21 Preview

Page 168 RESOLVED

Every DNS Lookup Suddenly Takes 5+ Seconds

SYMPTOMS & IMPACT
Applications throw connection timeout alerts under high concurrent loads. The latency metrics spike to exactly 5000ms for service discovery lookups, but CoreDNS CPU usage remains completely normal.
DIAGNOSIS & CAUSE

1. Execute active network latency queries from within an application container to verify service discovery timings:

sh - app-pod
# Execute test lookup inside target application container
$ kubectl exec -it app-pod -- time nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
real 0m 5.008s
user 0m 0.002s
sys 0m 0.005s

2. Root Cause: This is a connection tracking race condition in the Linux kernel netfilter conntrack module when performing parallel A and AAAA DNS lookups over UDP. Under load, the kernel NAT translation drops the duplicate insertion socket request, triggering a 5-second timeout resolver fallback.

RESOLUTION RUNBOOK
  • Deploy NodeLocal DNSCache as a DaemonSet to intercept UDP DNS queries locally, bypassing the kernel conntrack NAT table entirely.
  • Minimize SEARCH paths and decrease resolver retries by tuning pod configuration: set ndots: 1 in dnsConfig.
Joydeep Mondal

About the Author: Joydeep Mondal

Joydeep Mondal is a Senior SRE and platform engineer specializing in national-scale, citizen-facing government platforms operating 24x7 with no maintenance window. He builds resilient system boundaries and guides engineering organizations in resolving critical production incidents.

Command the Cluster

Master kubectl for production — from everyday commands to break-glass recovery. The field manual senior engineers wish they'd had on day one.

$6.34 $2.11
67% OFF 268 Pages (PDF)

Powered by Razorpay

Limited Time Offer: 67% OFF