By Dan Flanagan • 2024-12-03 • 5 min

Introducing Pilum: A Recipe-Driven Deployment Orchestrator

Today I’m open-sourcing Pilum, a multi-service deployment orchestrator written in Go. This post covers the context, architecture decisions, and technical implementation details.

Context

At SID Technologies, we run services across multiple cloud providers and distribution channels. A typical release involves:

API services deployed to GCP Cloud Run
CLI tools distributed via Homebrew
Background workers on Cloud Run with different scaling configs
Static assets synced to S3

Each platform has its own deployment CLI, authentication model, and configuration format. The cognitive overhead compounds: remembering gcloud run deploy flags versus brew tap semantics versus aws s3 sync options. Deployments became a copy-paste ritual of shell scripts, and the scripts inevitably drifted out of sync.

I wanted a single command that could deploy any service to any target, with the deployment logic defined declaratively rather than imperatively.

Inspiration: The Roman Pilum

The pilum was the javelin of the Roman legions. Its design was elegant in its specificity: a long iron shank connected to a wooden shaft, with a weighted pyramidal tip. It was engineered for a single purpose - to be thrown once and penetrate the target.

The soft iron shank would bend on impact, preventing the enemy from throwing it back and rendering their shield useless. One weapon. One throw. Mission accomplished.

This resonated with what I wanted from a deployment tool: define the target once, execute once, hit precisely.

The Problem with Existing Tools

Why not Terraform? Or Pulumi? Or just shell scripts?

Terraform/Pulumi are infrastructure-as-code tools optimized for provisioning resources. They’re declarative about what exists, not how to deploy code to those resources. Deploying a new version of a Cloud Run service requires running gcloud commands, not HCL.

Shell scripts work until they don’t. They’re imperative, hard to test, and the “deployment logic” gets scattered across Makefiles, CI configs, and random bash files. Adding a new provider means duplicating logic.

Platform-specific CLIs (goreleaser, ko, etc.) are excellent but single-purpose. You still need orchestration to coordinate multiple services across multiple platforms.

What I wanted:

Declarative service configuration (YAML)
Reusable deployment workflows (recipes)
Parallel execution with step barriers
Provider-agnostic core with pluggable handlers

Architecture

Pilum’s architecture separates three concerns:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Service Config │     │     Recipe      │     │    Handlers     │
│  (service.yaml) │────▶│  (recipe.yaml)  │────▶│  (Go functions) │
└─────────────────┘     └─────────────────┘     └─────────────────┘
      WHAT                    HOW                   IMPLEMENTATION

Service configs declare what you’re deploying: name, provider, region, build settings. These live in your repo alongside your code.

Recipes define how to deploy to a provider: the ordered sequence of steps, required fields, timeouts. These are YAML files that ship with Pilum.

Handlers implement the actual commands: building Docker images, pushing to registries, calling cloud CLIs. These are Go functions registered at startup.

Service Configuration

A minimal service.yaml:

name: api-gateway
provider: gcp
project: my-project
region: us-central1

build:
  language: go
  version: "1.23"

The provider field determines which recipe is used. All other fields are validated against that recipe’s requirements.

Recipe System

Recipes are the core abstraction. Here’s the GCP Cloud Run recipe:

name: gcp-cloud-run
description: Deploy to Google Cloud Run
provider: gcp
service: cloud_run

required_fields:
  - name: project
    description: GCP project ID
    type: string
  - name: region
    description: GCP region to deploy to
    type: string

steps:
  - name: build binary
    execution_mode: service_dir
    timeout: 300

  - name: build docker image
    execution_mode: service_dir
    timeout: 300

  - name: publish to registry
    execution_mode: root
    timeout: 120

  - name: deploy to cloud run
    execution_mode: root
    timeout: 180
    default_retries: 2

The recipe declares:

Required fields: Validated before any execution. If your service.yaml is missing project, you get an error immediately, not 10 minutes into a build.
Steps: Ordered sequence of operations. Each step has a name, execution mode, and timeout.
Execution mode: service_dir runs in the service’s directory, root runs from the project root.

Recipe Validation

The validation logic uses reflection to check service configs against recipe requirements:

func (r *Recipe) ValidateService(svc *serviceinfo.ServiceInfo) error {
    for _, field := range r.RequiredFields {
        value := getServiceField(svc, field.Name)
        if value == "" && field.Default == "" {
            return errors.New("recipe '%s' requires field '%s': %s",
                r.Name, field.Name, field.Description)
        }
    }
    return nil
}

The getServiceField function first checks a hardcoded map of common fields, then falls back to the raw config map, and finally uses reflection as a last resort. This gives us type safety for known fields while remaining flexible for custom recipe requirements.

Command Registry

Steps map to handlers via a pattern-matching registry:

type StepHandler func(ctx StepContext) any

type CommandRegistry struct {
    handlers map[string]StepHandler
}

func (cr *CommandRegistry) Register(pattern string, provider string, handler StepHandler) {
    key := cr.buildKey(pattern, provider)
    cr.handlers[key] = handler
}

func (cr *CommandRegistry) GetHandler(stepName string, provider string) (StepHandler, bool) {
    // Try provider-specific first, fall back to generic
    // Pattern matching is case-insensitive with partial match support
}

This design allows:

Generic handlers (build works for any provider)
Provider-specific overrides (deploy:gcp vs deploy:aws)
Partial matching (build binary matches the build handler)

Handlers return any because commands can be strings or string slices:

func buildHandler(ctx StepContext) any {
    return []string{
        "go", "build",
        "-ldflags", fmt.Sprintf("-X main.version=%s", ctx.Tag),
        "-o", fmt.Sprintf("dist/%s", ctx.Service.Name),
        ".",
    }
}

Parallel Execution with Step Barriers

The orchestrator executes services in parallel within each step, but steps execute sequentially:

func (r *Runner) Run() error {
    maxSteps := r.findMaxSteps()

    for stepIdx := 0; stepIdx < maxSteps; stepIdx++ {
        err := r.executeStep(stepIdx)
        if err != nil {
            return err // Fail fast
        }
    }
    return nil
}

Within each step, a worker pool processes services concurrently:

func (r *Runner) executeTasksParallel(tasks []stepTask) error {
    semaphore := make(chan struct{}, r.getWorkerCount())

    for _, t := range tasks {
        go func() {
            semaphore <- struct{}{}        // acquire
            defer func() { <-semaphore }() // release

            result := r.executeTask(task.service, task.step)
            // ...
        }()
    }
}

This means:

All services build in parallel (step 1)
Once all builds complete, all pushes happen in parallel (step 2)
Once all pushes complete, all deploys happen in parallel (step 3)

The barrier between steps ensures dependencies are satisfied. You can’t push an image that hasn’t been built.

Variable Substitution

Recipe commands support variable substitution:

func (r *Runner) substituteVars(cmd any, svc serviceinfo.ServiceInfo) any {
    replacer := strings.NewReplacer(
        "${name}", svc.Name,
        "${service.name}", svc.Name,
        "${provider}", svc.Provider,
        "${region}", svc.Region,
        "${project}", svc.Project,
        "${tag}", r.options.Tag,
    )
    // Handle string, []string, and []any
}

This allows recipes to use service-specific values without hardcoding:

steps:
  - name: deploy
    command: gcloud run deploy ${name} --region=${region} --project=${project}

Why Open Source

A few reasons:

Dogfooding: Pilum deploys itself via Homebrew. Every release exercises the tool’s own recipe system. If it breaks, we feel it immediately.

Extensibility: The recipe system is designed for customization. You can add new providers by creating a YAML file - no Go code required for basic workflows. For complex providers, you register handlers.

Validation: Open source means more eyes on the code. Deployment tools have security implications. I’d rather have the community find issues than discover them in production.

Getting Started

Install via Homebrew:

brew tap sid-technologies/pilum
brew install pilum

Create a service.yaml in your project:

name: my-service
provider: gcp
project: my-gcp-project
region: us-central1

Validate your configuration:

pilum check

Deploy:

pilum deploy --tag=v1.0.0

Preview without executing:

pilum deploy --dry-run --tag=v1.0.0

What’s Next

Current providers: GCP Cloud Run, Homebrew, AWS Lambda (in progress).

On the roadmap:

Kubernetes (kubectl apply workflows)
AWS ECS
GitHub Releases integration
Parallel recipe discovery across monorepos

The code is on GitHub. The landing page is at pilum.dev.

Introducing Pilum: A Recipe-Driven Deployment Orchestrator

Context

Inspiration: The Roman Pilum

The Problem with Existing Tools

Architecture

Service Configuration

Recipe System

Recipe Validation

Command Registry

Parallel Execution with Step Barriers

Variable Substitution

Why Open Source

Getting Started

What’s Next

~/posts/introducing-pilum $ git log

Get new posts by email