Platforms like sadservers.com and iximuz labs are excellent for SRE scenario practice — real infrastructure, well-designed challenges, no setup required. I use them regularly.
sre-dojo is a local alternative built on the same idea: broken environments you debug and fix, with a single command to validate the solution. It runs on OrbStack on your Mac, lives in a git repo, and takes about 30 seconds to spin up any scenario.
The repo: github.com/75asu/sre-dojo
The Design
Each scenario is a folder with four files:
scenarios/docker/chennai/
├── scenario.yaml # metadata: title, difficulty, tags, description
├── orb-setup.sh # creates the OrbStack machine and introduces the break
├── verify.sh # runs inside the machine and confirms the fix
└── README.md # problem statement, no hints
The runner (lab.py) orchestrates everything:
./lab.py start chennai # provisions the machine, introduces the break
./lab.py verify chennai # runs verify.sh, prints pass/fail
./lab.py stop chennai # destroys the machine
You get a broken environment in 30 seconds. Fix it. Run verify. Done.
Why OrbStack
OrbStack machines are lightweight Linux VMs on macOS. They start in seconds, support networking between machines, and are fully disposable. orb create ubuntu:25.04 mylab gives you an Ubuntu machine; orb delete mylab removes it cleanly.
The alternative was Docker containers, but containers can’t run systemd, can’t simulate disk-level failures, and can’t run some networking scenarios. OrbStack machines behave like real Linux servers.
The Break Pattern
The orb-setup.sh script does two things: sets up the environment and introduces the break.
For a RabbitMQ scenario, setup looks like this:
# install deps
pip3 install pika
# deploy rabbitmq with wrong credentials
cat > ~/app/docker-compose.yml << EOF
services:
rabbitmq:
image: rabbitmq:3-management
environment:
RABBITMQ_DEFAULT_USER: wronguser
RABBITMQ_DEFAULT_PASS: wrongpass
ports:
- "5672:5672"
EOF
docker compose -f ~/app/docker-compose.yml up -d
# drop scripts that expect the default credentials
cat > ~/producer.py << 'EOF'
# ... hardcoded to guest/guest
EOF
The scripts expect guest/guest. The container runs with wronguser/wrongpass. The user has to find and fix the mismatch. Setup and break in one script — reproducible every time.
The Verify Pattern
verify.sh is the source of truth. It doesn’t hint at the solution — it just tests the outcome:
#!/bin/bash
set -e
output=$(python3 ~/producer.py hello-lwc 2>&1)
if ! echo "$output" | grep -q "Message sent"; then
echo "FAIL: producer did not send message"
exit 1
fi
result=$(python3 ~/consumer.py 2>&1)
if [ "$result" != "hello-lwc" ]; then
echo "FAIL: consumer got '$result', expected 'hello-lwc'"
exit 1
fi
echo "PASS"
Run ./lab.py verify chennai. If it exits 0, you fixed it. No ambiguity.
Scenario Types
Three categories so far:
- Linux — filesystem, systemd, cron, permissions, process management
- Docker — compose, networking, Caddy, RabbitMQ, container debugging
- Kubernetes — StatefulSet scheduling, ConfigMap drift, CrashLoopBackOff, Helm
Each scenario is tagged by tool so you can filter by what you want to practice. scenario.yaml carries the metadata:
name: chennai
title: "Fix the RabbitMQ Cluster"
difficulty: medium
type: docker
status: ready
tags: [rabbitmq, docker, applications]
Takeaway
The constraint that made this worth building: full local control. New scenario takes about 20 minutes to write. I can design breaks that mirror real incidents I’ve dealt with — credential mismatches, misconfigured health checks, PDB blocking a drain. That specificity is hard to get from a public platform.
The repo is open: github.com/75asu/sre-dojo. PRs welcome if you want to add a scenario.