kernel_panic

2025-02-15 · 7 min read

Cloud Migration Checklist: Planning Your Move to AWS/GCP

A practical, step-by-step checklist for planning and executing a cloud migration without losing your mind (or your data).

Before You Start

Cloud migration is one of those projects that seems straightforward until you're in the middle of it. The difference between a smooth migration and a nightmare usually comes down to preparation. This checklist is based on lessons learned from dozens of migrations — the gotchas, the things that break at 2 AM, and the shortcuts that aren't actually shortcuts.

Phase 1: Assessment

Before writing a single line of Terraform, you need to understand what you're working with.

Inventory Everything

  • Servers: What's running, how much CPU/RAM/disk, what OS version
  • Databases: Type, version, size, replication setup, backup schedule
  • Storage: File systems, object storage, total data volume
  • Networking: VPNs, firewalls, load balancers, DNS configuration
  • External services: Third-party APIs, SaaS integrations, CDN configuration
  • Scheduled jobs: Cron jobs, batch processes, data pipelines

This step is tedious but critical. You will discover servers nobody knew about. You will find cron jobs that haven't run in years but might still be important. Document everything.

Map Dependencies

Draw a diagram of how your services talk to each other. For each service, document:

  • What it depends on (databases, APIs, message queues)
  • What depends on it (other services, external consumers)
  • Communication protocols (HTTP, gRPC, direct DB connections)
  • Latency sensitivity (can it tolerate cross-region calls?)

This dependency map drives your migration order. Services with fewer dependencies migrate first.

Establish Baselines

Before migrating, measure your current performance:

  • Response times: P50, P95, P99 for all endpoints
  • Error rates: By service and by endpoint
  • Resource utilization: CPU, memory, disk I/O, network
  • Traffic patterns: Peak hours, seasonal variations

You'll need these numbers to validate that the migration didn't degrade performance.

Phase 2: Architecture Design

Choose Your Target Architecture

Don't just "lift and shift" your existing architecture to the cloud. This is your chance to make improvements — but be strategic about what you change.

Lift and shift (rehost): Move VMs as-is to cloud equivalents. Fastest to execute but doesn't leverage cloud-native benefits.

Replatform: Minor adjustments like moving to managed databases (RDS, Cloud SQL) instead of self-managed. Good balance of speed and improvement.

Refactor: Redesign for cloud-native patterns (containers, serverless, managed services). Most beneficial but most time-consuming.

Our recommendation: replatform for the initial migration, then refactor incrementally once you're in the cloud.

Design for Failure

Cloud environments fail differently than on-premises. Design for it:

  • Use multiple availability zones for redundancy
  • Implement health checks and auto-healing
  • Design services to be stateless where possible
  • Use managed services to offload operational burden
  • Plan for AZ failures, region failures, and service outages

Infrastructure as Code

Define everything in Terraform, Pulumi, or your IaC tool of choice before the migration. This gives you:

  • A reviewable, version-controlled infrastructure definition
  • The ability to spin up identical environments for testing
  • Easy rollback if something goes wrong
  • Documentation that stays in sync with reality

Phase 3: Security & Compliance

IAM Strategy

Design your IAM structure before creating any resources:

  • Use separate accounts/projects for dev, staging, and production
  • Implement least-privilege access with role-based policies
  • Enable MFA for all human users
  • Use service accounts with scoped permissions for applications
  • Set up audit logging from day one

Network Security

  • Design your VPC/VNet with proper CIDR planning (leave room to grow)
  • Segment networks by environment and sensitivity
  • Use private subnets for databases and internal services
  • Implement security groups/firewall rules with deny-by-default
  • Set up VPN or private connectivity for hybrid scenarios

Data Protection

  • Enable encryption at rest for all storage (databases, object storage, volumes)
  • Enable encryption in transit (TLS everywhere)
  • Plan your key management strategy (cloud KMS vs. self-managed)
  • Implement backup strategy with tested restore procedures
  • Document data residency requirements (region restrictions)

Phase 4: Migration Execution

Set Up the Landing Zone

Before migrating any workloads:

  1. Create accounts/projects with proper organizational hierarchy
  2. Configure networking (VPCs, subnets, peering, DNS)
  3. Set up IAM roles and policies
  4. Enable logging and monitoring
  5. Configure backup policies
  6. Validate security controls

Migration Order

Migrate in this order to minimize risk:

  1. Stateless services with few dependencies (lowest risk)
  2. Internal tools that can tolerate downtime
  3. Read replicas of databases (test without affecting production)
  4. Application tier (with database connections pointing to existing databases)
  5. Databases (the scariest part — do this last)

Database Migration

Database migration deserves special attention. Options from least to most disruptive:

Continuous replication: Set up replication from source to target, let it sync, then cut over. Best option when available (AWS DMS, GCP Database Migration Service).

Dump and restore: Take a backup, restore to the new database, update connection strings. Requires downtime proportional to database size.

Dual-write: Application writes to both old and new databases during transition. Complex but zero-downtime. Only use if replication isn't an option.

The Cutover

Plan your cutover like a military operation:

  1. Runbook: Step-by-step document for every action during cutover
  2. Rollback plan: Specific steps to revert if something goes wrong
  3. Communication plan: Who to notify before, during, and after
  4. Timing: Choose a low-traffic window
  5. Team: Everyone involved should be online and on a call
  6. Monitoring: Extra dashboards and alerts during the cutover period

Practice the cutover in staging at least once. Ideally twice.

Phase 5: Validation

Smoke Tests

Immediately after cutover:

  • All endpoints return expected responses
  • Authentication works
  • Database reads and writes succeed
  • Scheduled jobs execute
  • Integrations with external services work
  • Email/notification systems function

Performance Validation

Compare against your baselines:

  • Response times are within acceptable range
  • Error rates haven't increased
  • Resource utilization is as expected
  • No memory leaks or connection pool exhaustion

Security Validation

  • All network access controls are in place
  • No unintended public exposure
  • Encryption at rest and in transit verified
  • IAM policies are correct
  • Audit logging is capturing events

Phase 6: Post-Migration

Optimization

Once stable in the cloud, optimize:

  • Right-size instances based on actual utilization
  • Implement auto-scaling for variable workloads
  • Evaluate reserved instances or committed use discounts
  • Move appropriate workloads to spot/preemptible instances
  • Enable cloud-native monitoring and alerting

Decommission Old Infrastructure

Don't rush this. Keep old infrastructure running (but not serving traffic) for at least 2-4 weeks. You'll be glad you did when you discover something you forgot to migrate.

Document Everything

Update your documentation to reflect the new architecture:

  • Network diagrams
  • Runbooks and playbooks
  • On-call procedures
  • Disaster recovery plan
  • Cost monitoring and alerts

Common Pitfalls

  • Underestimating data transfer time: Moving terabytes takes longer than you think. Start early.
  • Forgetting about DNS TTL: Lower your TTLs well before the migration so DNS changes propagate quickly.
  • Not testing backups: A backup you haven't restored isn't a backup.
  • Skipping the staging rehearsal: If you haven't practiced the migration end-to-end, you're not ready.
  • Trying to migrate and refactor at the same time: Migrate first, optimize later.
  • Ignoring costs: Cloud resources start costing money immediately. Set up billing alerts before you start.

Planning a cloud migration? We've guided dozens of teams through successful migrations to AWS and GCP. Let's plan yours together.

Need help with your infrastructure?

We've been solving problems like these for 18+ years. Let's talk about how we can help your team.