kernel_panic

2025-02-01 · 6 min read

CI/CD Pipeline Best Practices: A Practical Guide

Lessons from building hundreds of CI/CD pipelines — what actually matters, what's overrated, and how to build pipelines your team will love.

What Makes a Good Pipeline

After building CI/CD pipelines for over a decade across every major platform, we've seen patterns that consistently work — and anti-patterns that consistently cause pain. A good pipeline is fast, reliable, and trustworthy. Your team should be able to merge code and forget about it, confident that the pipeline will catch problems or get changes safely to production.

Here's what that looks like in practice.

Speed Is a Feature

The single most impactful thing you can do for your CI/CD pipeline is make it fast. Every minute of build time is a minute where a developer is context-switching, losing focus, or stacking up another change on top of an unvalidated one.

Parallelize Everything

Don't run tests sequentially if they can run in parallel. Most CI platforms support parallel jobs natively:

# GitHub Actions example
jobs:
  lint:
    runs-on: ubuntu-latest
    steps: [...]
  test-unit:
    runs-on: ubuntu-latest
    steps: [...]
  test-integration:
    runs-on: ubuntu-latest
    steps: [...]

Three jobs running in parallel will finish in the time of the slowest one, not the sum of all three.

Cache Aggressively

Downloading dependencies on every build is one of the biggest time sinks. Cache your package manager's download directory:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ hashFiles('package-lock.json') }}

Also cache build artifacts, compiled assets, and anything else that doesn't change between commits.

Use Smart Test Splitting

If you have a large test suite, split it across multiple runners. Tools like CircleCI's test splitting or custom scripts that divide tests by timing data can cut your test time dramatically.

Reliability Matters More Than Features

A flaky pipeline is worse than no pipeline. If your team learns to ignore failures because "it's probably just that flaky test," you've lost the entire value of CI.

Fix Flaky Tests Immediately

When a test fails intermittently, stop everything and fix it. Common causes:

  • Race conditions in async tests
  • Shared state between test cases
  • Time-dependent assertions
  • Network calls to external services

Mock external dependencies, isolate test state, and use deterministic time in tests.

Pin Your Dependencies

Use lock files (package-lock.json, Pipfile.lock, go.sum) and pin your CI runner versions. "Latest" is the enemy of reproducibility.

# Good
runs-on: ubuntu-22.04
- uses: actions/setup-node@v4
  with:
    node-version: '20.11.0'

# Bad
runs-on: ubuntu-latest
- uses: actions/setup-node@v4
  with:
    node-version: 'latest'

Make Builds Reproducible

Given the same commit, your pipeline should produce the same result every time. This means:

  • No reliance on mutable external state
  • Deterministic dependency resolution (lock files)
  • Pinned tool versions
  • Hermetic build environments

Deployment Strategies

How you deploy is as important as what you deploy. The goal is to minimize risk while maximizing deployment frequency.

Start Simple, Get Fancy Later

A basic deployment strategy that works:

  1. Merge to main triggers a build
  2. Run all tests
  3. Build artifacts (Docker image, compiled binary, etc.)
  4. Deploy to staging automatically
  5. Deploy to production with a manual approval gate

This is good enough for most teams. Don't overcomplicate it until you have a reason to.

When to Add Canary/Blue-Green

Consider more advanced deployment strategies when:

  • You have enough traffic that errors affect many users
  • You need zero-downtime deployments
  • You want automated rollback based on metrics

Canary deployments (routing a small percentage of traffic to the new version) are excellent for catching issues that don't show up in testing. Blue-green deployments (maintaining two identical environments and switching between them) give you instant rollback capability.

Rollback Should Be One Click

Whatever your deployment strategy, rolling back should be trivial. If rolling back requires a new build, hotfix branch, and manual intervention, you'll hesitate to deploy frequently.

Best approach: keep the last N deployment artifacts available and make rollback a matter of redeploying the previous artifact.

Security in the Pipeline

Your CI/CD pipeline is a high-value target. It has access to your source code, production credentials, and deployment infrastructure.

Secret Management

  • Never hardcode secrets in pipeline configurations
  • Use your CI platform's secret management
  • Rotate secrets regularly
  • Audit who has access to pipeline secrets

Dependency Scanning

Run npm audit, pip-audit, or equivalent on every build. Fail the build on critical vulnerabilities:

- name: Security audit
  run: npm audit --audit-level=critical

Image Scanning

If you're building Docker images, scan them for known vulnerabilities before deploying:

- name: Scan image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: your-image:latest
    severity: 'CRITICAL,HIGH'

Pipeline as Code

Your pipeline definition should live in your repository, be version-controlled, and go through the same review process as your application code.

Benefits of Pipeline as Code

  • Auditability: You can see who changed what and when
  • Reproducibility: Any commit can be rebuilt with its exact pipeline
  • Portability: Move between CI platforms without vendor lock-in
  • Collaboration: Pipeline changes go through PR review

Keep It DRY

Use reusable workflows, shared actions, or templates to avoid duplicating pipeline logic across repositories:

# .github/workflows/deploy.yml
jobs:
  deploy:
    uses: your-org/.github/.github/workflows/deploy-template.yml@main
    with:
      environment: production
    secrets: inherit

Measuring Pipeline Health

You can't improve what you don't measure. Track these metrics:

  • Build time: P50 and P95 (aim for under 10 minutes)
  • Success rate: Should be above 95%
  • Deployment frequency: How often you ship to production
  • Lead time: Commit to production time
  • MTTR: Mean time to recover from a failed deployment

These are the DORA metrics, and they're the best indicators of your team's delivery performance.

Common Anti-Patterns

The "Test Everything in CI" Trap

Not every test needs to run on every commit. Run fast unit tests on every push, integration tests on PR, and full end-to-end suites on merge to main or on a schedule.

The "One Pipeline to Rule Them All" Problem

A monorepo with 50 services doesn't need one pipeline that builds everything. Use change detection to only build and test what changed.

Manual Gates Everywhere

Each manual approval gate slows your pipeline. Use them sparingly — typically only for production deployment. Everything else should be automated.


Building or optimizing CI/CD pipelines? We've been doing this for a long time and we'd love to help. Let's talk.

Need help with your infrastructure?

We've been solving problems like these for 18+ years. Let's talk about how we can help your team.