development

Astro Background Caching and CDN Image Optimization

Asep Alazhari

How I reduced image load times by 40x and cut bandwidth by 95% using background caching and local image optimization in Astro.

Astro Background Caching and CDN Image Optimization

The Challenge: Building a Fast News Landing Page on a Budget

Picture this: You’re building a news landing page, fetching the latest news from an external API, displaying images from a CDN. Sounds straightforward, right?

Not quite.

I quickly ran into performance problems that would make any developer cringe:

  • External images loading at a snail’s pace (50-200ms per image)
  • API calls overwhelming the news provider and hitting rate limits regularly
  • Tight resource constraints (512MB RAM, 0.5 CPU core per container)
  • Unpredictable CDN reliability where sometimes images just did not load

The user experience was terrible. Pages felt sluggish. Images popped in awkwardly. And worst of all, I was constantly worried about hitting API rate limits or the CDN going down.

I needed a solution that would:

  1. Reduce load on the external news API
  2. Speed up image loading dramatically
  3. Stay within my resource constraints
  4. Work reliably in production

What followed was a three-month journey of iteration, learning, and optimization. Here’s how I went from a slow, unreliable news page to a lightning-fast, self-contained system that reduced bandwidth by 95% and made images load 40x faster.

First Attempt: In-Memory Caching

My first approach seemed logical: implement a simple 30-minute cache in memory.

I was migrating from Next.js to Astro anyway, so I figured I’d add basic caching at the same time. The results were immediately promising:

// Simple in-memory cache (my first attempt)
let newsCache = {
    data: [],
    timestamp: 0,
};

const CACHE_DURATION = 30 * 60 * 1000; // 30 minutes

export async function GET() {
    const now = Date.now();

    // Check if cache is still valid
    if (newsCache.data.length > 0 && now - newsCache.timestamp < CACHE_DURATION) {
        return new Response(JSON.stringify(newsCache), {
            headers: { "Content-Type": "application/json" },
        });
    }

    // Fetch fresh data
    const freshData = await fetchNewsFromAPI();
    newsCache = { data: freshData, timestamp: now };

    return new Response(JSON.stringify(newsCache), {
        headers: { "Content-Type": "application/json" },
    });
}

The Good News:

  • Achieved 80% resource reduction by consolidating from 5 containers down to 1
  • API calls dropped dramatically (from hundreds per hour to just 2)
  • Memory usage stayed low (around 100-150MB)

The Problem I Discovered:

One day, I restarted the container for an update. The cache was gone. Users saw slow load times again until the cache rebuilt itself.

Learning Number 1: In-memory solutions are great for performance, but terrible for persistence. Production systems need to survive restarts gracefully.

Second Iteration: File-Based Persistent Cache

Three days later, I had a revelation. Why not save the cache to disk?

I created a newsCache.js utility that would:

  • Save cached news data to a JSON file
  • Read from the file on startup
  • Provide a manual regeneration endpoint

Here’s the core implementation:

import { promises as fs } from "fs";
import path from "path";

const CACHE_DIR = path.join(process.cwd(), "dist", "cache");
const CACHE_FILE = path.join(CACHE_DIR, "news-data.json");

// Ensure cache directory exists
async function ensureDirectories() {
    try {
        await fs.mkdir(CACHE_DIR, { recursive: true });
    } catch (error) {
        console.error("Error creating cache directory:", error);
    }
}

// Read cache from file
export async function readCache() {
    try {
        const data = await fs.readFile(CACHE_FILE, "utf-8");
        return JSON.parse(data);
    } catch (error) {
        // Return empty cache if file doesn't exist
        return { data: [], timestamp: 0, updatedAt: null };
    }
}

// Save cache to file
async function saveCache(newsItems) {
    await ensureDirectories();

    const cacheData = {
        data: newsItems,
        timestamp: Date.now(),
        updatedAt: new Date().toISOString(),
    };

    await fs.writeFile(CACHE_FILE, JSON.stringify(cacheData, null, 2));
    console.log(`[NewsCache] Saved ${newsItems.length} items to cache`);
}

I also added a manual regeneration endpoint:

// /api/regenerate - Manual cache refresh
export async function POST({ request }) {
    const apiKey = request.headers.get("X-API-Key");

    // Simple API key authentication
    if (apiKey !== process.env.REGEN_API_KEY) {
        return new Response(JSON.stringify({ error: "Unauthorized" }), {
            status: 401,
        });
    }

    const success = await regenerateCache();
    const cache = await readCache();

    return new Response(
        JSON.stringify({
            success: true,
            count: cache.data.length,
            timestamp: cache.timestamp,
        })
    );
}

Why File-Based Cache?

  1. Persistence (survives container restarts)
  2. Memory Efficient (does not consume RAM when not in use)
  3. Easy to Debug (just cat news-data.json to see what is cached)
  4. No External Dependencies (no Redis, Memcached, or database needed)

The Remaining Problem:

I still needed an external cron job to hit the /api/regenerate endpoint every 2 hours. This felt clunky. I had to set up a separate service just to keep my cache fresh.

Learning Number 2: Persistence solves the restart problem, but external dependencies create new operational complexity.


Also Read:


The Breakthrough: Background Scheduler and Image Optimization

Two months passed. The system was working, but that external cron job nagged at me. There had to be a better way.

Then it hit me. What if the scheduler lived inside the application itself?

And while I was at it, why not solve the slow CDN images problem too?

The Architecture

I designed a system with three core components:

  1. Background Scheduler (node-cron): Runs inside the container, refreshes cache every 2 hours
  2. Image Downloader: Downloads CDN images to local filesystem during cache refresh
  3. Orchestrator (start.js): Coordinates both Astro server and the scheduler

Here’s how it all fits together:

Docker Container
 └── start.js (Orchestrator)
      ├── Background Scheduler (node-cron)
      │    └── Runs every 2 hours
      │         ├── Fetch news from API
      │         ├── Download images to local filesystem
      │         ├── Save JSON cache file
      │         └── Cleanup old images

      └── Astro Server (SSR mode)
           └── /api/news endpoint serves from cache file
                └── Cache-Control: public, max-age=3600

Part 1: The Background Scheduler

First, I created the scheduler using node-cron:

// scheduler.js
import cron from "node-cron";
import { regenerateCache, cleanupOldImages } from "./utils/newsCache.js";

export function startScheduler() {
    // Run every 2 hours: 0 */2 * * *
    const scheduleExpression = "0 */2 * * *";

    console.log("[Scheduler] Starting background scheduler...");
    console.log(`[Scheduler] Will run every 2 hours: ${scheduleExpression}`);

    const task = cron.schedule(
        scheduleExpression,
        async () => {
            console.log("[Scheduler] Running scheduled cache regeneration...");

            try {
                const success = await regenerateCache();

                if (success) {
                    console.log("[Scheduler] Cache regenerated successfully");

                    // Cleanup old images that are no longer in cache
                    await cleanupOldImages();
                    console.log("[Scheduler] Image cleanup completed");
                } else {
                    console.warn("[Scheduler] Cache regeneration failed");
                }
            } catch (error) {
                console.error("[Scheduler] Error during scheduled task:", error);
            }
        },
        {
            scheduled: true,
            timezone: "Asia/Jakarta",
        }
    );

    // Run initial cache on startup
    console.log("[Scheduler] Running initial cache regeneration...");
    regenerateCache().catch(console.error);

    return task;
}

Part 2: Local Image Optimization

This is where the magic happens. Instead of serving images from the external CDN, I download them once and serve them locally:

// Image download and processing
const IMAGES_DIR = path.join(process.cwd(), "dist", "client", "images", "news");

function getFilenameFromUrl(url) {
    try {
        const urlObj = new URL(url);
        const pathname = urlObj.pathname;
        const filename = pathname.split("/").pop();
        return filename || `image-${Date.now()}.jpg`;
    } catch (error) {
        return `image-${Date.now()}.jpg`;
    }
}

async function downloadImage(url, filename) {
    try {
        console.log(`[NewsCache] Downloading image: ${filename}`);

        const response = await fetch(url);
        if (!response.ok) throw new Error(`HTTP ${response.status}`);

        const arrayBuffer = await response.arrayBuffer();
        const buffer = Buffer.from(arrayBuffer);

        const filePath = path.join(IMAGES_DIR, filename);
        await fs.writeFile(filePath, buffer);

        // Return local path
        return `/images/news/${filename}`;
    } catch (error) {
        console.error(`[NewsCache] Error downloading ${filename}:`, error.message);
        return null; // Will use original URL as fallback
    }
}

async function processNewsItems(newsItems) {
    await fs.mkdir(IMAGES_DIR, { recursive: true });

    const processedItems = [];

    for (const item of newsItems) {
        const processedItem = { ...item };

        if (item.enclosure) {
            const filename = getFilenameFromUrl(item.enclosure);
            const localPath = await downloadImage(item.enclosure, filename);

            // Keep original URL as fallback
            processedItem.enclosureOriginal = item.enclosure;
            processedItem.enclosure = localPath || item.enclosure;
        }

        processedItems.push(processedItem);
    }

    return processedItems;
}

Why This Works So Well:

  1. Performance: Local filesystem reads are 10-40x faster than CDN requests
  2. Reliability: No dependency on external CDN availability
  3. Bandwidth: Images downloaded once every 2 hours, not on every page view
  4. Fallback: Original URL preserved in enclosureOriginal

Part 3: Smart Image Cleanup

I didn’t want old images piling up forever, so I added cleanup logic:

export async function cleanupOldImages() {
    try {
        // Read current cache to get list of active images
        const cache = await readCache();
        const currentImages = new Set();

        cache.data.forEach((item) => {
            if (item.enclosure?.startsWith("/images/news/")) {
                const filename = item.enclosure.split("/").pop();
                currentImages.add(filename);
            }
        });

        // Read all files in images directory
        const files = await fs.readdir(IMAGES_DIR);

        // Delete images not in current cache
        let deletedCount = 0;
        for (const file of files) {
            if (!currentImages.has(file)) {
                await fs.unlink(path.join(IMAGES_DIR, file));
                deletedCount++;
            }
        }

        if (deletedCount > 0) {
            console.log(`[NewsCache] Cleaned up ${deletedCount} old images`);
        }
    } catch (error) {
        console.error("[NewsCache] Error during image cleanup:", error);
    }
}

Part 4: The Orchestrator

Finally, I needed to run both the Astro server and the scheduler together:

// start.js
import { spawn } from "child_process";
import { startScheduler } from "./scheduler.js";

console.log("[Start] Initializing application...");

// Start background scheduler
const schedulerTask = startScheduler();
console.log("[Start] Background scheduler started");

// Start Astro server
const astroServer = spawn("node", ["./dist/server/entry.mjs"], {
    stdio: "inherit",
    env: {
        ...process.env,
        HOST: "0.0.0.0",
        PORT: "4321",
    },
});

// Handle graceful shutdown
process.on("SIGTERM", () => {
    console.log("[Start] Received SIGTERM, shutting down gracefully...");
    schedulerTask.stop();
    astroServer.kill("SIGTERM");
});

process.on("SIGINT", () => {
    console.log("[Start] Received SIGINT, shutting down gracefully...");
    schedulerTask.stop();
    astroServer.kill("SIGINT");
});

astroServer.on("exit", (code) => {
    console.log(`[Start] Astro server exited with code ${code}`);
    process.exit(code);
});

Performance Results: The Numbers Don’t Lie

After deploying this final solution, the improvements were dramatic:

Before vs After Comparison

MetricBefore (CDN)After (Local)Improvement
Image Load Time50-200ms1-5ms10-40x faster
API Response Time100-300ms1-10ms30x faster
Network BandwidthHigh (every request)Low (once per 2h)95% reduction 📉
CDN DependencyCriticalNone100% elimination
Cache StorageN/A~500KBNegligible 💾

Resource Usage (Production)

Memory Usage (per container):
├── Base Astro Server: ~80-100 MB
├── Node.js Runtime: ~30-40 MB
├── Background Scheduler: ~5-10 MB (idle)
├── Cache Regeneration (peak): +20-30 MB
└── Image Downloads (peak): +10-15 MB (temporary)

Total: 115-150 MB idle / 145-195 MB peak
Container Limit: 512 MB (60% headroom) ✅

CPU Usage:
├── Idle State: 0.1-0.3%
├── Serving Requests: 1-5% per request
├── Cache Regeneration: 5-15% (every 2 hours)
└── Image Downloads: 10-20% (brief spikes, 5-10s)

Container Limit: 0.5 CPU (50%) ✅

Cost Savings:

  • Bandwidth: ~95% reduction in outbound traffic = lower CDN/egress costs
  • Infrastructure: 80% fewer containers = lower hosting costs
  • Operational: No external cron service needed = simpler architecture

Deployment Architecture: Making It Production-Ready

Getting this to work in production required some clever Docker configuration:

Multi-Stage Dockerfile

# Stage 1: Build
FROM node:20.18.1-alpine AS builder
WORKDIR /home/app

COPY package*.json ./
COPY yarn.lock ./
RUN yarn install --frozen-lockfile

COPY . .
RUN yarn build  # Builds Astro + runs post-build.js

# Stage 2: Production
FROM node:20.18.1-alpine AS runner

# Install dependencies
RUN apk add --no-cache dumb-init curl

# Create app user
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser

WORKDIR /home/app

# Create writable directories with correct permissions
RUN mkdir -p /home/app/dist/cache && \
    mkdir -p /home/app/dist/client/images/news && \
    chown -R appuser:appgroup /home/app

# Copy built files and scheduler
COPY --from=builder --chown=appuser:appgroup /home/app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /home/app/package*.json ./
COPY --from=builder --chown=appuser:appgroup /home/app/node_modules ./node_modules

USER appuser

EXPOSE 4321

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD node -e "require('http').get('http://localhost:4321/api/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"

# Start orchestrator (Astro + Scheduler)
CMD ["dumb-init", "node", "./dist/start.js"]

Post-Build Script

To get the scheduler files into the dist directory, I created a post-build script:

// scripts/post-build.js
import { copyFile, mkdir } from "fs/promises";
import { join } from "path";

const srcDir = "src";
const distDir = "dist";

async function main() {
    console.log("[Post-Build] Copying scheduler files to dist...");

    // Create necessary directories
    await mkdir(join(distDir, "utils"), { recursive: true });
    await mkdir(join(distDir, "cache"), { recursive: true });
    await mkdir(join(distDir, "client", "images", "news"), { recursive: true });

    // Copy scheduler files
    await copyFile(join(srcDir, "start.js"), join(distDir, "start.js"));

    await copyFile(join(srcDir, "scheduler.js"), join(distDir, "scheduler.js"));

    await copyFile(join(srcDir, "utils", "newsCache.js"), join(distDir, "utils", "newsCache.js"));

    console.log("[Post-Build] Scheduler files copied successfully");
}

main().catch(console.error);

Add to package.json:

{
    "scripts": {
        "build": "astro build && node scripts/post-build.js"
    }
}

Also Read:


Key Lessons Learned

Looking back on this three-month journey, here are the insights that made the difference:

1. Evolution Is Normal

Don’t expect to nail the perfect solution on the first try. My three iterations taught me:

  • First attempt: In-memory cache (quick win, but fragile)
  • Second iteration: File-based cache (solved persistence, but clunky)
  • Final solution: Integrated solution (elegant and production-ready)

Each iteration was a stepping stone, not a failure.

2. File-Based Caching Is Underrated

In a world of Redis and Memcached, it’s easy to forget that the filesystem is a perfectly good cache for many use cases:

  • ✅ No external dependencies
  • ✅ Survives restarts
  • ✅ Easy to debug
  • ✅ Memory efficient

Don’t overcomplicate when simple solutions work.

3. Local Images > CDN for Predictable Performance

This surprised me, but the numbers don’t lie:

  • 10-40x faster load times
  • 95% bandwidth reduction
  • No CDN reliability concerns
  • Minimal storage cost (~500KB)

For content that changes infrequently (every 2 hours), local storage is a no-brainer.

4. Background Jobs Don’t Need External Services

I initially thought I needed a separate cron service. Turns out, node-cron running inside the application is:

  • Simpler to deploy (one container instead of two services)
  • Easier to monitor (logs in one place)
  • More reliable (no network calls between services)

5. Always Include Fallbacks

Notice how I kept enclosureOriginal in the cache? That’s intentional. If image download fails, the app falls back to the CDN URL.

Defense in depth isn’t just for security—it’s for reliability too.

6. Measure Everything

Without metrics, I wouldn’t have known:

  • That images were taking 50-200ms
  • That I achieved 10-40x improvement
  • That resource usage stayed within limits

Measure, optimize, measure again.


Also Read:


When Should You Use This Pattern?

This approach works best when:

Content updates infrequently (hourly, every few hours) ✅ External APIs have rate limits or performance issues ✅ CDN images are slow or unreliable ✅ Resource constraints require efficiency ✅ Operational simplicity matters (fewer moving parts)

Not ideal for:

❌ Real-time data (stock prices, live scores) ❌ User-generated content that needs immediate visibility ❌ Massive image libraries (local storage gets expensive)

Conclusion: From Slow to Lightning Fast

What started as a frustrating performance problem became a masterclass in iteration and optimization.

By combining:

  • Background scheduling with node-cron
  • File-based persistent caching
  • Local image optimization
  • Smart cleanup strategies

I transformed a slow, unreliable news page into a lightning-fast, self-contained system that:

  • ⚡ Loads images 40x faster
  • 📉 Reduced bandwidth by 95%
  • 💰 Cut infrastructure costs by 80%
  • 🔒 Eliminated external dependencies

The journey taught me that great performance doesn’t require complex infrastructure—sometimes the best solutions are the simplest ones.

Back to Blog

Related Posts

View All Posts »