Kubernetes Docker Swarm Safe Installation: Zero Downtime Migration

The late-night alerts and the familiar “service is down” messages are every DevOps engineer’s worst nightmare. I’ve been there countless times, staring at monitoring dashboards at 3 AM, troubleshooting production environments where every second of downtime translates to revenue loss and frustrated users. But sometimes, these challenging moments become the catalyst for growth and innovation.

This migration story began with a personal challenge and a strategic opportunity converging at the perfect moment. I had been running Docker Swarm for years, and while it served us well, I couldn’t ignore the writing on the task: Kubernetes had become the industry standard. Docker Swarm, despite its simplicity, was increasingly feeling outdated in a landscape where Kubernetes dominated enterprise container orchestration. The ecosystem, tooling, job market, and community momentum were all pointing in one direction.

The motivation wasn’t just technical curiosity, it was about future-proofing my skills and infrastructure. I needed to challenge myself to master Kubernetes, not just as an academic exercise, but through real-world implementation with actual production workloads. When the opportunity arose to migrate several applications to new server infrastructure, the timing couldn’t have been more perfect. This was my chance to bridge theory with practice.

However, the business reality was unforgiving: zero downtime tolerance. The production workload, let’s call it prod-app, was serving thousands of users daily, while two staging services (staging-app-1 and staging-app-2) were critical for ongoing development cycles. This wasn’t just a personal learning journey, it was a high-stakes balancing act between embracing the future of container orchestration and maintaining rock-solid stability for existing operations.

The breakthrough came with what I call the “parallel coexistence strategy”, running a complete Kubernetes cluster alongside the existing Docker Swarm infrastructure without disrupting a single production container. This approach proved that you can pursue cutting-edge technology adoption while maintaining operational excellence. Sometimes the best way to learn isn’t to replace, but to build alongside and gradually transition.

The Strategic Decision: Why I Chose a Parallel Installation

Migrating production systems is inherently risky. A simple misstep could lead to catastrophic downtime, impacting users and revenue. My strategy was to mitigate this risk entirely by avoiding a direct in-place upgrade. Instead of converting or replacing the existing Docker Swarm, I opted to build a new Kubernetes cluster on the same physical servers. This ultra-safe approach allowed me to:

Maintain 100% Uptime for Production: The prod-app remained on Docker Swarm, completely isolated from the Kubernetes installation process through careful resource partitioning.
Safe Staging Environment: I designated the staging services, staging-app-1 and staging-app-2, as the first candidates for Kubernetes migration testing. This provided a controlled sandbox to validate the new orchestration platform without any production risk.
Zero-Impact Changes: The installation methodology was meticulously designed to avoid system reboots, hostname modifications, or firewall reconfigurations, ensuring that existing services continued operating normally.
Resource Coexistence: Both container runtimes (Docker and containerd) operated simultaneously using different socket interfaces, preventing any resource conflicts or performance degradation.

This method gave me the best of both worlds: the stability of the current production environment and a secure pathway to a more modern container orchestration platform.

Phase 1: Pre-Installation & System Preparation

Before diving into the Kubernetes installation, I implemented a comprehensive pre-installation audit across all six servers (k8s-node01 through k8s-node06). This “read-only” assessment phase was designed to establish a baseline understanding of the current infrastructure without making any modifications that could impact running services.

The audit covered several critical areas: OS version compatibility (ensuring CentOS 7.x/RHEL 7.x baseline), hostname verification to avoid DNS conflicts, Docker Swarm cluster health validation, and available system resources. Most importantly, I verified that all production services were running optimally before proceeding.

One of the most critical yet surprisingly low-risk steps was updating the /etc/hosts file across all nodes with the Kubernetes cluster topology. This host resolution setup is fundamental for Kubernetes inter-node communication, but it’s completely transparent to existing Docker Swarm services since they continue using their established service discovery mechanisms.

Code Block: System Status Verification & Host Resolution

# Execute on ALL servers (k8s-node01 through k8s-node06)
# Validate Docker Swarm cluster health
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"
docker service ls  # Verify all services are running
echo "✅ Docker Swarm cluster is healthy - safe to proceed"

# Add Kubernetes cluster nodes to /etc/hosts for proper name resolution
sudo tee -a /etc/hosts <<EOF
192.168.1.10 k8s-node01
192.168.1.11 k8s-node02
192.168.1.12 k8s-node03
192.168.1.13 k8s-node04
192.168.1.14 k8s-node05
192.168.1.15 k8s-node06
EOF
echo "✅ Kubernetes cluster host resolution configured"

Also Read: GitLab CI/CD: Dynamic Variables Across Environments

Phase 2: Container Runtime Coexistence

One of the biggest challenges was getting Kubernetes to run alongside Docker Swarm. Kubernetes requires a Container Runtime Interface (CRI), and while Docker could theoretically be used, I chose containerd for a cleaner, more isolated setup.

I installed containerd on every server. This was a non-disruptive process as containerd is a lightweight runtime that can coexist with the Docker daemon without conflict. The main configuration change was enabling the SystemdCgroup driver, a requirement for Kubernetes, within the containerd configuration file.

Code Block: Installing and Configuring containerd

# Execute on ALL servers
sudo dnf install -y containerd
sudo systemctl enable containerd
sudo systemctl start containerd

# Change SystemdCgroup = false to true in /etc/containerd/config.toml
sudo vim /etc/containerd/config.toml

# Verify both are running
sudo systemctl status docker --no-pager
sudo systemctl status containerd --no-pager
echo "✅ Both Docker and containerd are running in parallel"

Also Read: Mastering Automated Docker Tagging in GitLab CI/CD: A Practical Guide

Phase 3 & 4: Cluster Initialization & Control Plane Setup

With the infrastructure foundation established, I proceeded to install the core Kubernetes components (kubelet, kubeadm, kubectl) across all nodes. The installation strategy involved deploying these components system-wide while keeping them dormant until the cluster initialization phase. The kubeadm init command was executed exclusively on the designated primary control plane node, k8s-node01.

The initialization command included key flags that were critical for the ultra-safe approach:

--ignore-preflight-errors=Swap: This flag allowed me to bypass Kubernetes’ default swap-off requirement, which would have necessitated a system-level change.
--cri-socket=unix:///var/run/containerd/containerd.sock: This explicitly told kubeadm to use the newly installed containerd runtime, ensuring it didn’t interfere with Docker.

After the primary node was initialized, I saved the generated join commands for the other nodes and configured kubectl to manage the new cluster. This gave me the first look at the cluster’s health and confirmed the new control plane was up and running.

Final Phases: CNI, Node Joins, and Cluster Validation

The remaining steps involved bringing the rest of the cluster online and preparing it for workloads.

CNI Installation: I installed Calico, a popular CNI (Container Network Interface) plugin, to enable network communication between pods. This step was performed on k8s-node01 and immediately brought the nodes into a “Ready” state.
Node Joining: I used the saved kubeadm join commands to add k8s-node02 as a second control plane node (ensuring high availability) and configured the remaining four servers (k8s-node03 through k8s-node06) as worker nodes.
Tools & Ingress: I installed Helm for package management, a local storage provisioner for persistent volumes, and HAProxy Ingress Controller to manage external traffic. Since I was behind a corporate firewall, I used a manual YAML-based installation for HAProxy, which proved to be a reliable workaround.

Finally, a comprehensive validation confirmed my success: all Kubernetes nodes were ready, system pods were running, and, most importantly, the production prod-app was still humming along on Docker Swarm.

This phased, ultra-safe approach proved that it’s possible to adopt new technology without sacrificing the stability of your current production environment. The staging services are now running on Kubernetes, and I can validate functionality and performance before making the full leap, all while the business-critical service continues to operate without interruption.

Architecture Overview: NodePort vs Ingress Strategy

One of the key architectural decisions that emerged from this migration was implementing a NodePort-based routing strategy for the applications. Unlike traditional Ingress-based approaches, I chose direct NodePort access for several compelling reasons:

Why NodePort Architecture?

Simplified Network Path: Direct external access via NodeIP:30080 eliminates the complexity of Ingress controllers and load balancer configurations
Reduced Failure Points: Fewer network hops mean fewer potential points of failure in the request path
Corporate Firewall Compatibility: Many enterprise environments have strict firewall policies that make NodePort access more straightforward than complex Ingress setups
Performance Benefits: Direct access reduces latency compared to multiple proxy layers

Implementation Pattern

# NodePort Service Configuration
apiVersion: v1
kind: Service
metadata:
    name: staging-app-1
    namespace: staging
spec:
    type: NodePort
    ports:
        - port: 3000
          targetPort: 3000
          nodePort: 30080
    selector:
        app: staging-app-1

This architecture enabled me to expose staging-app-1 directly on port 30080 across all nodes, with external HAProxy routing /app1 requests to NodeIP:30080 for seamless user access.

Configuration Management Strategy

One of the most important lessons learned during this migration was the critical importance of proper configuration management. Kubernetes provides two primary mechanisms for configuration data: ConfigMaps for non-sensitive data and Secrets for sensitive information.

ConfigMap vs Secret Separation

ConfigMap Usage (Public Configuration):

apiVersion: v1
kind: ConfigMap
metadata:
    name: app-config
data:
    NODE_ENV: "production"
    APP_PORT: "3000"
    BASE_PATH: "/app"
    PUBLIC_API_URL: "https://api.example.com"

Secret Usage (Sensitive Data):

apiVersion: v1
kind: Secret
metadata:
    name: app-secrets
type: Opaque
data:
    API_KEY: <base64-encoded-value>
    DATABASE_URL: <base64-encoded-value>
    JWT_SECRET: <base64-encoded-value>

This separation ensures that sensitive information is encrypted at rest and properly managed through Kubernetes RBAC, while public configuration remains easily readable and maintainable.

Production-Ready Features

The migration wasn’t just about moving containers from Docker Swarm to Kubernetes, it was about implementing enterprise-grade features that would improve the operational posture.

Horizontal Pod Autoscaler (HPA) Setup

One of Kubernetes’ most powerful features is automatic scaling based on resource utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: staging-app-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: staging-app-1
    minReplicas: 15
    maxReplicas: 60
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 65
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 75

This configuration automatically scales the staging application between 15 and 60 replicas based on CPU (65% threshold) and memory (75% threshold) utilization, ensuring optimal resource usage and performance.

Pod Disruption Budget (PDB) Configuration

To maintain high availability during cluster maintenance or updates, I implemented Pod Disruption Budgets:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
    name: staging-app-pdb
spec:
    minAvailable: 75%
    selector:
        matchLabels:
            app: staging-app-1

This ensures that at least 75% of the application pods remain available during voluntary disruptions like node maintenance or cluster upgrades.

Security Context Implementation

Security was paramount in the production-ready setup. I implemented comprehensive security contexts:

spec:
    template:
        spec:
            securityContext:
                runAsNonRoot: true
                runAsUser: 1001
                fsGroup: 1001
            containers:
                - name: app
                  securityContext:
                      allowPrivilegeEscalation: false
                      readOnlyRootFilesystem: true
                      capabilities:
                          drop:
                              - ALL
                  volumeMounts:
                      - name: tmp-volume
                        mountPath: /tmp
                      - name: cache-volume
                        mountPath: /app/.next/cache

This configuration ensures the applications run as non-root users with minimal privileges and read-only root filesystems, significantly reducing the attack surface.

Lessons Learned and Best Practices

1. Resource Monitoring is Critical

During the parallel installation, I discovered that monitoring both Docker Swarm and Kubernetes resource consumption was essential. I implemented comprehensive monitoring using Prometheus and Grafana to track:

Memory usage across both platforms
CPU utilization patterns
Network traffic segregation
Disk I/O for different container runtimes

2. GitLab CI/CD Integration

The migration also involved updating our CI/CD pipelines to support both platforms during the transition period. I implemented a sophisticated deployment strategy:

stages:
    - build
    - deploy-swarm
    - deploy-k8s
    - validate

deploy-staging-k8s:
    stage: deploy-k8s
    script:
        - kubectl apply -k k8s/
        - kubectl rollout status deployment/staging-app-1 -n staging
    only:
        - main
    when: manual # Safety measure for controlled deployments

3. Health Check Evolution

One unexpected benefit was the evolution of the health check strategies. Kubernetes’ native health check capabilities (liveness, readiness, and startup probes) provided much more granular control than Docker Swarm’s basic health checks:

livenessProbe:
    httpGet:
        path: /api/health
        port: 3000
    initialDelaySeconds: 30
    periodSeconds: 10
readinessProbe:
    httpGet:
        path: /api/ready
        port: 3000
    initialDelaySeconds: 5
    periodSeconds: 5
startupProbe:
    httpGet:
        path: /api/startup
        port: 3000
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 30

This tri-probe approach significantly improved the application reliability and deployment success rates.

The journey from Docker Swarm to Kubernetes doesn’t have to be a leap of faith. With careful planning, parallel installation strategies, and comprehensive testing, you can achieve zero-downtime migration while building a more scalable and maintainable container orchestration platform. The key is patience, thorough preparation, and always having a rollback plan ready.