NixOS Custom Command System - Complete Overview¶

Introduction¶

The NixOS custom command system provides specialized, automated workflows for managing complex NixOS infrastructure. Each command embeds best practices, safety checks, GitHub integration, and comprehensive documentation into repeatable procedures.

Architecture¶

Command Structure¶

.claude/commands/
├── flake-update.md         # Flake input management
├── update-claude-code.md   # Claude Code package updates
├── system-health-check.md  # Infrastructure monitoring
├── update-package.md       # Generic package updates
├── deploy-all.md          # Multi-host deployment
└── config-audit.md        # Configuration quality audit

Command Format¶

Each command is a Markdown file containing:

Task Overview: High-level description
Prerequisites: Required checks before execution
Step-by-Step Procedures: Detailed workflow
Success Criteria: Validation checklist
Rollback Procedures: Emergency recovery steps
Troubleshooting: Common issues and solutions
Documentation References: Links to relevant docs

Execution Flow¶

graph TD
    A[User invokes /command] --> B[Claude Code loads command]
    B --> C[Execute prerequisites check]
    C --> D{Prerequisites pass?}
    D -->|No| E[Report issues, halt]
    D -->|Yes| F[Execute main workflow]
    F --> G[Run validation checks]
    G --> H{Validation pass?}
    H -->|No| I[Report errors, rollback]
    H -->|Yes| J[Complete successfully]
    J --> K[Generate documentation]
    K --> L[Update GitHub if needed]

Command Catalog¶

1. `/flake-update` - Flake Input Management¶

Purpose: Safely update NixOS flake inputs with testing and deployment

When to Use:

Weekly maintenance schedule
After upstream security updates
When new features needed from nixpkgs
Before major infrastructure changes

Key Features:

Pre-update system checks
Parallel host testing
Smart deployment (only changed hosts)
Post-deployment verification
Automatic rollback on failure
Detailed commit documentation

Typical Duration: 10-15 minutes

Risk Level: Medium (automated testing reduces risk)

2. `/system-health-check` - Infrastructure Monitoring¶

Purpose: Comprehensive health assessment across all infrastructure

When to Use:

Weekly scheduled checks
After major deployments
When investigating issues
Before starting new projects

Key Features:

Multi-host connectivity testing
Service status verification
Resource utilization analysis
Monitoring stack validation
Media server health (Plex/NZBGet)
Security audit
Comprehensive report generation

Typical Duration: 5-10 minutes

Risk Level: None (read-only operations)

3. `/update-claude-code` - Claude Code Updates¶

Purpose: Update Claude Code package with proper testing

When to Use:

New release available
Bug fixes needed
Feature updates desired
Monthly maintenance

Key Features:

Automatic version research
Hash calculation
Multi-host testing
GitHub issue/PR creation
Deployment verification

Typical Duration: 8-12 minutes

Risk Level: Low (single package update)

4. `/update-package` - Generic Package Updates¶

Purpose: Update any package in the configuration

When to Use:

Security updates
Bug fixes
Feature requirements
Dependency updates

Key Features:

Flexible package selection
Multiple update strategies
Comprehensive testing
GitHub workflow integration
Rollback procedures

Typical Duration: 10-20 minutes (varies by package)

Risk Level: Medium (depends on package criticality)

5. `/deploy-all` - Multi-Host Deployment¶

Purpose: Deploy configuration changes to all hosts

When to Use:

After configuration changes
Post-testing deployment
System-wide updates
Coordinated rollouts

Key Features:

Pre-deployment validation
Multiple deployment strategies
Post-deployment verification
Monitoring integration
Emergency rollback
Deployment documentation

Typical Duration: 5-15 minutes (depends on strategy)

Risk Level: Medium-High (affects all hosts)

6. `/config-audit` - Configuration Quality Audit¶

Purpose: Audit configuration for anti-patterns and quality

When to Use:

Monthly quality checks
Before major releases
After large changes
Code review processes

Key Features:

Anti-pattern detection
Security audit
Module system review
Performance analysis
Code quality metrics
Comprehensive reporting
Issue creation for findings

Typical Duration: 15-25 minutes

Risk Level: None (analysis only)

Command Comparison Matrix¶

Command	Frequency	Duration	Risk	Auto-Deploy	GitHub Integration
`/flake-update`	Weekly	10-15m	Medium	Yes	Yes
`/system-health-check`	Weekly	5-10m	None	No	Optional
`/update-claude-code`	Monthly	8-12m	Low	Yes	Yes
`/update-package`	As needed	10-20m	Medium	Yes	Yes
`/deploy-all`	As needed	5-15m	Med-High	Yes	Optional
`/config-audit`	Monthly	15-25m	None	No	Yes

Integration with Infrastructure¶

GitHub Workflow Integration¶

Commands integrate with the GitHub-based workflow:

# Command creates issue
/update-claude-code
# → Creates GitHub issue with /new_task
# → Creates branch following conventions
# → Creates PR with testing evidence
# → Tracks progress in GitHub

# Command checks status
/system-health-check
# → Reviews open issues with /check_tasks
# → Reports on issue status
# → Recommends prioritization

Just Command Integration¶

Commands leverage existing just commands:

# Commands use just for:
just check-syntax    # Syntax validation
just validate-quick  # Fast configuration validation
just quick-test      # Parallel host testing
just quick-deploy    # Smart deployment
just test-host HOST  # Individual host testing

Monitoring Integration¶

Commands interact with monitoring stack:

# Health checks verify:
grafana-status       # Dashboard availability
prometheus-status    # Metrics collection
node-exporter-status # Exporter health

# Commands access monitoring:
curl http://p620:9090/api/v1/targets  # Prometheus
curl http://p620:9093/api/v2/alerts   # Alertmanager

Documentation Integration¶

Commands reference infrastructure documentation:

@docs/PATTERNS.md - NixOS best practices
@docs/NIXOS-ANTI-PATTERNS.md - What to avoid
@docs/GITHUB-WORKFLOW.md - Development workflow
@.agent-os/product/roadmap.md - Project priorities

Command Development Workflow¶

Creating New Commands¶

Identify Need

# What repetitive task needs automation?
# What workflow needs documentation?
# What process has high error risk?

Design Command

# Command structure:

- Task overview
- Prerequisites
- Step-by-step workflow
- Success criteria
- Rollback procedures
- Troubleshooting
- Documentation references

Implement Command

# Create command file
vim .claude/commands/new-command.md

# Follow existing patterns
# Include comprehensive error handling
# Add rollback procedures

Test Command

# Test execution
claude
/new-command

# Verify all steps
# Test error conditions
# Confirm rollback works

Document Command

# Add to docs/Nixos/
vim docs/Nixos/New-Command-Guide.md

# Update README
vim docs/Nixos/README.md

Create GitHub Issue

/new_task
# Track command creation
# Document design decisions
# Plan testing

Updating Existing Commands¶

# 1. Identify improvement
# - Error handling gaps
# - Missing features
# - Outdated procedures

# 2. Edit command
vim .claude/commands/COMMAND.md

# 3. Test changes
claude
/COMMAND

# 4. Update documentation
vim docs/Nixos/COMMAND-Guide.md

# 5. Commit changes
git add .claude/commands/COMMAND.md docs/Nixos/
git commit -m "feat(commands): improve /COMMAND workflow"

Best Practices¶

Command Usage¶

Do:

Read command documentation first
Review command output carefully
Follow all validation steps
Monitor after deployment
Create GitHub issues for tracking
Keep commands up-to-date

Don't:

Skip prerequisite checks
Ignore warning messages
Deploy without testing
Forget rollback procedures
Skip documentation updates

Command Development¶

Do:

Follow existing patterns
Include comprehensive error handling
Add rollback procedures
Document all steps clearly
Test thoroughly before use
Keep commands maintainable

Don't:

Create overly complex commands
Skip error handling
Forget rollback procedures
Write unclear documentation
Neglect testing

Safety Features¶

Built-in Safeguards¶

Prerequisites Checks

# Commands verify before executing:
- Git status clean
- No critical open issues
- Monitoring operational
- Hosts reachable

Validation Steps

# Commands validate at each stage:
just check-syntax     # Syntax errors
just validate-quick   # Configuration errors
just quick-test       # Build errors

Rollback Procedures

# Every command includes:
- Generation rollback: nixos-rebuild switch --rollback
- Git revert: git revert COMMIT
- Manual recovery: step-by-step instructions

Monitoring Integration

# Commands verify deployment health:
- Service status checks
- Monitoring metrics
- Alert verification
- 24-hour monitoring plan

Risk Mitigation¶

Risk	Mitigation
Syntax errors	Pre-deployment syntax validation
Build failures	Comprehensive testing before deploy
Service failures	Post-deployment verification
Configuration drift	Declarative configuration + git
Security issues	Secrets management + hardening checks
Performance issues	Resource monitoring + alerts
Human error	Automated workflows + validation

Troubleshooting¶

Command Not Executing¶

# 1. Verify command exists
ls -la .claude/commands/

# 2. Check command file syntax
cat .claude/commands/COMMAND.md

# 3. Restart Claude Code
exit
claude

Command Fails During Execution¶

# 1. Review error message
# Error messages indicate specific failure

# 2. Check prerequisites
# Ensure all prerequisites met

# 3. Use rollback procedure
# Each command includes rollback steps

# 4. Create issue for investigation
/new_task

Unexpected Results¶

# 1. Review command output
# Commands provide detailed logging

# 2. Check system state
/system-health-check

# 3. Consult documentation
cat docs/Nixos/COMMAND-Guide.md

# 4. Check recent changes
git log --oneline -10

Performance Optimization¶

Command Execution Speed¶

Optimization	Impact
Parallel testing	75% faster
Smart deployment	Skip unchanged hosts
Binary caching	Faster builds
Local testing	No network overhead

Workflow Efficiency¶

# Traditional workflow: 30+ minutes
# 1. Manual research (5-10 min)
# 2. File updates (5 min)
# 3. Hash calculation (2-5 min)
# 4. Testing (5-10 min)
# 5. GitHub workflow (5-10 min)
# 6. Deployment (5 min)

# Command workflow: 5-15 minutes
/update-package
# Everything automated

Result: 10x productivity improvement

Metrics and Monitoring¶

Command Success Metrics¶

Track command effectiveness:

# Success rate
echo "Commands run: N"
echo "Successful: N (X%)"
echo "Failed: N (X%)"

# Time savings
echo "Average time without commands: 30 min"
echo "Average time with commands: 10 min"
echo "Time saved: 20 min per operation"

# Quality improvements
echo "Issues detected before deployment: N"
echo "Rollbacks required: N"
echo "Anti-patterns eliminated: N"

Usage Statistics¶

# Track command usage
/flake-update: Weekly (52/year)
/system-health-check: Weekly (52/year)
/update-package: Monthly (12/year)
/config-audit: Monthly (12/year)

# Total operations: ~150/year
# Time saved: 50+ hours/year

Future Enhancements¶

Planned Commands¶

/backup-verify
Verify backup status
Test restore procedures
Generate backup reports
/security-audit
Security-focused audit
Vulnerability scanning
Compliance checking
/performance-analysis
Performance profiling
Bottleneck identification
Optimization recommendations
/weekly-maintenance
Automated weekly tasks
Combines health check + updates
Generates weekly report
/emergency-rollback
Fast rollback all hosts
Emergency recovery
Incident documentation

Enhancement Ideas¶

Command chaining and composition
Interactive prompts for complex decisions
Progress visualization
Notification integration (email, Slack)
Historical trend analysis
Predictive maintenance

Documentation and Learning¶

Command Documentation Structure¶

Each command has:

Command File: .claude/commands/COMMAND.md
Executable workflow
Step-by-step instructions
Code examples
Guide Document: docs/Nixos/COMMAND-Guide.md
Conceptual explanation
Use cases and examples
Best practices
Troubleshooting
Integration References
Links to pattern docs
Links to anti-pattern docs
Links to workflows

Learning Path¶

Start Here: Read README.md
Understand: Read this overview
Try Simple: Run /system-health-check
Try Complex: Run /flake-update
Master: Create custom commands

Summary¶

The NixOS custom command system provides:

Automation: Complex workflows become one-line commands Safety: Built-in validation and rollback procedures Documentation: Living documentation of procedures Quality: Automatic adherence to best practices Efficiency: 10x productivity improvement Integration: Seamless GitHub and monitoring integration Learning: Clear workflows for team members

Result: Professional, maintainable, and safe NixOS infrastructure management.

Next Steps:

Review individual command guides
Try commands in your workflow
Customize commands for your needs
Create new commands as needed
Share improvements with the team

NixOS Custom Command System - Complete Overview¶

Introduction¶

Architecture¶

Command Structure¶

Command Format¶

Execution Flow¶

Command Catalog¶

1. /flake-update - Flake Input Management¶

2. /system-health-check - Infrastructure Monitoring¶

3. /update-claude-code - Claude Code Updates¶

4. /update-package - Generic Package Updates¶

5. /deploy-all - Multi-Host Deployment¶

6. /config-audit - Configuration Quality Audit¶

Command Comparison Matrix¶

Integration with Infrastructure¶

GitHub Workflow Integration¶

Just Command Integration¶

Monitoring Integration¶

Documentation Integration¶

Command Development Workflow¶

Creating New Commands¶

Updating Existing Commands¶

Best Practices¶

Command Usage¶

Command Development¶

Safety Features¶

Built-in Safeguards¶

Risk Mitigation¶

Troubleshooting¶

Command Not Executing¶

Command Fails During Execution¶

Unexpected Results¶

Performance Optimization¶

Command Execution Speed¶

Workflow Efficiency¶

Metrics and Monitoring¶

Command Success Metrics¶

Usage Statistics¶

Future Enhancements¶

Planned Commands¶

Enhancement Ideas¶

Documentation and Learning¶

Command Documentation Structure¶

Learning Path¶

Summary¶

1. `/flake-update` - Flake Input Management¶

2. `/system-health-check` - Infrastructure Monitoring¶

3. `/update-claude-code` - Claude Code Updates¶

4. `/update-package` - Generic Package Updates¶

5. `/deploy-all` - Multi-Host Deployment¶

6. `/config-audit` - Configuration Quality Audit¶