Astro Background Caching and CDN Image Optimization
How I reduced image load times by 40x and cut bandwidth by 95% using background caching and local image optimization in Astro.

The Challenge: Building a Fast News Landing Page on a Budget
Picture this: You’re building a news landing page, fetching the latest news from an external API, displaying images from a CDN. Sounds straightforward, right?
Not quite.
I quickly ran into performance problems that would make any developer cringe:
- External images loading at a snail’s pace (50-200ms per image)
- API calls overwhelming the news provider and hitting rate limits regularly
- Tight resource constraints (512MB RAM, 0.5 CPU core per container)
- Unpredictable CDN reliability where sometimes images just did not load
The user experience was terrible. Pages felt sluggish. Images popped in awkwardly. And worst of all, I was constantly worried about hitting API rate limits or the CDN going down.
I needed a solution that would:
- Reduce load on the external news API
- Speed up image loading dramatically
- Stay within my resource constraints
- Work reliably in production
What followed was a three-month journey of iteration, learning, and optimization. Here’s how I went from a slow, unreliable news page to a lightning-fast, self-contained system that reduced bandwidth by 95% and made images load 40x faster.
First Attempt: In-Memory Caching
My first approach seemed logical: implement a simple 30-minute cache in memory.
I was migrating from Next.js to Astro anyway, so I figured I’d add basic caching at the same time. The results were immediately promising:
// Simple in-memory cache (my first attempt)
let newsCache = {
data: [],
timestamp: 0,
};
const CACHE_DURATION = 30 * 60 * 1000; // 30 minutes
export async function GET() {
const now = Date.now();
// Check if cache is still valid
if (newsCache.data.length > 0 && now - newsCache.timestamp < CACHE_DURATION) {
return new Response(JSON.stringify(newsCache), {
headers: { "Content-Type": "application/json" },
});
}
// Fetch fresh data
const freshData = await fetchNewsFromAPI();
newsCache = { data: freshData, timestamp: now };
return new Response(JSON.stringify(newsCache), {
headers: { "Content-Type": "application/json" },
});
}The Good News:
- Achieved 80% resource reduction by consolidating from 5 containers down to 1
- API calls dropped dramatically (from hundreds per hour to just 2)
- Memory usage stayed low (around 100-150MB)
The Problem I Discovered:
One day, I restarted the container for an update. The cache was gone. Users saw slow load times again until the cache rebuilt itself.
Learning Number 1: In-memory solutions are great for performance, but terrible for persistence. Production systems need to survive restarts gracefully.
Second Iteration: File-Based Persistent Cache
Three days later, I had a revelation. Why not save the cache to disk?
I created a newsCache.js utility that would:
- Save cached news data to a JSON file
- Read from the file on startup
- Provide a manual regeneration endpoint
Here’s the core implementation:
import { promises as fs } from "fs";
import path from "path";
const CACHE_DIR = path.join(process.cwd(), "dist", "cache");
const CACHE_FILE = path.join(CACHE_DIR, "news-data.json");
// Ensure cache directory exists
async function ensureDirectories() {
try {
await fs.mkdir(CACHE_DIR, { recursive: true });
} catch (error) {
console.error("Error creating cache directory:", error);
}
}
// Read cache from file
export async function readCache() {
try {
const data = await fs.readFile(CACHE_FILE, "utf-8");
return JSON.parse(data);
} catch (error) {
// Return empty cache if file doesn't exist
return { data: [], timestamp: 0, updatedAt: null };
}
}
// Save cache to file
async function saveCache(newsItems) {
await ensureDirectories();
const cacheData = {
data: newsItems,
timestamp: Date.now(),
updatedAt: new Date().toISOString(),
};
await fs.writeFile(CACHE_FILE, JSON.stringify(cacheData, null, 2));
console.log(`[NewsCache] Saved ${newsItems.length} items to cache`);
}I also added a manual regeneration endpoint:
// /api/regenerate - Manual cache refresh
export async function POST({ request }) {
const apiKey = request.headers.get("X-API-Key");
// Simple API key authentication
if (apiKey !== process.env.REGEN_API_KEY) {
return new Response(JSON.stringify({ error: "Unauthorized" }), {
status: 401,
});
}
const success = await regenerateCache();
const cache = await readCache();
return new Response(
JSON.stringify({
success: true,
count: cache.data.length,
timestamp: cache.timestamp,
})
);
}Why File-Based Cache?
- Persistence (survives container restarts)
- Memory Efficient (does not consume RAM when not in use)
- Easy to Debug (just cat news-data.json to see what is cached)
- No External Dependencies (no Redis, Memcached, or database needed)
The Remaining Problem:
I still needed an external cron job to hit the /api/regenerate endpoint every 2 hours. This felt clunky. I had to set up a separate service just to keep my cache fresh.
Learning Number 2: Persistence solves the restart problem, but external dependencies create new operational complexity.
Also Read:
The Breakthrough: Background Scheduler and Image Optimization
Two months passed. The system was working, but that external cron job nagged at me. There had to be a better way.
Then it hit me. What if the scheduler lived inside the application itself?
And while I was at it, why not solve the slow CDN images problem too?
The Architecture
I designed a system with three core components:
- Background Scheduler (node-cron): Runs inside the container, refreshes cache every 2 hours
- Image Downloader: Downloads CDN images to local filesystem during cache refresh
- Orchestrator (start.js): Coordinates both Astro server and the scheduler
Here’s how it all fits together:
Docker Container
└── start.js (Orchestrator)
├── Background Scheduler (node-cron)
│ └── Runs every 2 hours
│ ├── Fetch news from API
│ ├── Download images to local filesystem
│ ├── Save JSON cache file
│ └── Cleanup old images
│
└── Astro Server (SSR mode)
└── /api/news endpoint serves from cache file
└── Cache-Control: public, max-age=3600Part 1: The Background Scheduler
First, I created the scheduler using node-cron:
// scheduler.js
import cron from "node-cron";
import { regenerateCache, cleanupOldImages } from "./utils/newsCache.js";
export function startScheduler() {
// Run every 2 hours: 0 */2 * * *
const scheduleExpression = "0 */2 * * *";
console.log("[Scheduler] Starting background scheduler...");
console.log(`[Scheduler] Will run every 2 hours: ${scheduleExpression}`);
const task = cron.schedule(
scheduleExpression,
async () => {
console.log("[Scheduler] Running scheduled cache regeneration...");
try {
const success = await regenerateCache();
if (success) {
console.log("[Scheduler] Cache regenerated successfully");
// Cleanup old images that are no longer in cache
await cleanupOldImages();
console.log("[Scheduler] Image cleanup completed");
} else {
console.warn("[Scheduler] Cache regeneration failed");
}
} catch (error) {
console.error("[Scheduler] Error during scheduled task:", error);
}
},
{
scheduled: true,
timezone: "Asia/Jakarta",
}
);
// Run initial cache on startup
console.log("[Scheduler] Running initial cache regeneration...");
regenerateCache().catch(console.error);
return task;
}Part 2: Local Image Optimization
This is where the magic happens. Instead of serving images from the external CDN, I download them once and serve them locally:
// Image download and processing
const IMAGES_DIR = path.join(process.cwd(), "dist", "client", "images", "news");
function getFilenameFromUrl(url) {
try {
const urlObj = new URL(url);
const pathname = urlObj.pathname;
const filename = pathname.split("/").pop();
return filename || `image-${Date.now()}.jpg`;
} catch (error) {
return `image-${Date.now()}.jpg`;
}
}
async function downloadImage(url, filename) {
try {
console.log(`[NewsCache] Downloading image: ${filename}`);
const response = await fetch(url);
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
const filePath = path.join(IMAGES_DIR, filename);
await fs.writeFile(filePath, buffer);
// Return local path
return `/images/news/${filename}`;
} catch (error) {
console.error(`[NewsCache] Error downloading ${filename}:`, error.message);
return null; // Will use original URL as fallback
}
}
async function processNewsItems(newsItems) {
await fs.mkdir(IMAGES_DIR, { recursive: true });
const processedItems = [];
for (const item of newsItems) {
const processedItem = { ...item };
if (item.enclosure) {
const filename = getFilenameFromUrl(item.enclosure);
const localPath = await downloadImage(item.enclosure, filename);
// Keep original URL as fallback
processedItem.enclosureOriginal = item.enclosure;
processedItem.enclosure = localPath || item.enclosure;
}
processedItems.push(processedItem);
}
return processedItems;
}Why This Works So Well:
- Performance: Local filesystem reads are 10-40x faster than CDN requests
- Reliability: No dependency on external CDN availability
- Bandwidth: Images downloaded once every 2 hours, not on every page view
- Fallback: Original URL preserved in
enclosureOriginal
Part 3: Smart Image Cleanup
I didn’t want old images piling up forever, so I added cleanup logic:
export async function cleanupOldImages() {
try {
// Read current cache to get list of active images
const cache = await readCache();
const currentImages = new Set();
cache.data.forEach((item) => {
if (item.enclosure?.startsWith("/images/news/")) {
const filename = item.enclosure.split("/").pop();
currentImages.add(filename);
}
});
// Read all files in images directory
const files = await fs.readdir(IMAGES_DIR);
// Delete images not in current cache
let deletedCount = 0;
for (const file of files) {
if (!currentImages.has(file)) {
await fs.unlink(path.join(IMAGES_DIR, file));
deletedCount++;
}
}
if (deletedCount > 0) {
console.log(`[NewsCache] Cleaned up ${deletedCount} old images`);
}
} catch (error) {
console.error("[NewsCache] Error during image cleanup:", error);
}
}Part 4: The Orchestrator
Finally, I needed to run both the Astro server and the scheduler together:
// start.js
import { spawn } from "child_process";
import { startScheduler } from "./scheduler.js";
console.log("[Start] Initializing application...");
// Start background scheduler
const schedulerTask = startScheduler();
console.log("[Start] Background scheduler started");
// Start Astro server
const astroServer = spawn("node", ["./dist/server/entry.mjs"], {
stdio: "inherit",
env: {
...process.env,
HOST: "0.0.0.0",
PORT: "4321",
},
});
// Handle graceful shutdown
process.on("SIGTERM", () => {
console.log("[Start] Received SIGTERM, shutting down gracefully...");
schedulerTask.stop();
astroServer.kill("SIGTERM");
});
process.on("SIGINT", () => {
console.log("[Start] Received SIGINT, shutting down gracefully...");
schedulerTask.stop();
astroServer.kill("SIGINT");
});
astroServer.on("exit", (code) => {
console.log(`[Start] Astro server exited with code ${code}`);
process.exit(code);
});Performance Results: The Numbers Don’t Lie
After deploying this final solution, the improvements were dramatic:
Before vs After Comparison
| Metric | Before (CDN) | After (Local) | Improvement |
|---|---|---|---|
| Image Load Time | 50-200ms | 1-5ms | 10-40x faster ⚡ |
| API Response Time | 100-300ms | 1-10ms | 30x faster ⚡ |
| Network Bandwidth | High (every request) | Low (once per 2h) | 95% reduction 📉 |
| CDN Dependency | Critical | None | 100% elimination ✅ |
| Cache Storage | N/A | ~500KB | Negligible 💾 |
Resource Usage (Production)
Memory Usage (per container):
├── Base Astro Server: ~80-100 MB
├── Node.js Runtime: ~30-40 MB
├── Background Scheduler: ~5-10 MB (idle)
├── Cache Regeneration (peak): +20-30 MB
└── Image Downloads (peak): +10-15 MB (temporary)
Total: 115-150 MB idle / 145-195 MB peak
Container Limit: 512 MB (60% headroom) ✅
CPU Usage:
├── Idle State: 0.1-0.3%
├── Serving Requests: 1-5% per request
├── Cache Regeneration: 5-15% (every 2 hours)
└── Image Downloads: 10-20% (brief spikes, 5-10s)
Container Limit: 0.5 CPU (50%) ✅Cost Savings:
- Bandwidth: ~95% reduction in outbound traffic = lower CDN/egress costs
- Infrastructure: 80% fewer containers = lower hosting costs
- Operational: No external cron service needed = simpler architecture
Deployment Architecture: Making It Production-Ready
Getting this to work in production required some clever Docker configuration:
Multi-Stage Dockerfile
# Stage 1: Build
FROM node:20.18.1-alpine AS builder
WORKDIR /home/app
COPY package*.json ./
COPY yarn.lock ./
RUN yarn install --frozen-lockfile
COPY . .
RUN yarn build # Builds Astro + runs post-build.js
# Stage 2: Production
FROM node:20.18.1-alpine AS runner
# Install dependencies
RUN apk add --no-cache dumb-init curl
# Create app user
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser
WORKDIR /home/app
# Create writable directories with correct permissions
RUN mkdir -p /home/app/dist/cache && \
mkdir -p /home/app/dist/client/images/news && \
chown -R appuser:appgroup /home/app
# Copy built files and scheduler
COPY --from=builder --chown=appuser:appgroup /home/app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /home/app/package*.json ./
COPY --from=builder --chown=appuser:appgroup /home/app/node_modules ./node_modules
USER appuser
EXPOSE 4321
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD node -e "require('http').get('http://localhost:4321/api/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"
# Start orchestrator (Astro + Scheduler)
CMD ["dumb-init", "node", "./dist/start.js"]Post-Build Script
To get the scheduler files into the dist directory, I created a post-build script:
// scripts/post-build.js
import { copyFile, mkdir } from "fs/promises";
import { join } from "path";
const srcDir = "src";
const distDir = "dist";
async function main() {
console.log("[Post-Build] Copying scheduler files to dist...");
// Create necessary directories
await mkdir(join(distDir, "utils"), { recursive: true });
await mkdir(join(distDir, "cache"), { recursive: true });
await mkdir(join(distDir, "client", "images", "news"), { recursive: true });
// Copy scheduler files
await copyFile(join(srcDir, "start.js"), join(distDir, "start.js"));
await copyFile(join(srcDir, "scheduler.js"), join(distDir, "scheduler.js"));
await copyFile(join(srcDir, "utils", "newsCache.js"), join(distDir, "utils", "newsCache.js"));
console.log("[Post-Build] Scheduler files copied successfully");
}
main().catch(console.error);Add to package.json:
{
"scripts": {
"build": "astro build && node scripts/post-build.js"
}
}Also Read:
Key Lessons Learned
Looking back on this three-month journey, here are the insights that made the difference:
1. Evolution Is Normal
Don’t expect to nail the perfect solution on the first try. My three iterations taught me:
- First attempt: In-memory cache (quick win, but fragile)
- Second iteration: File-based cache (solved persistence, but clunky)
- Final solution: Integrated solution (elegant and production-ready)
Each iteration was a stepping stone, not a failure.
2. File-Based Caching Is Underrated
In a world of Redis and Memcached, it’s easy to forget that the filesystem is a perfectly good cache for many use cases:
- ✅ No external dependencies
- ✅ Survives restarts
- ✅ Easy to debug
- ✅ Memory efficient
Don’t overcomplicate when simple solutions work.
3. Local Images > CDN for Predictable Performance
This surprised me, but the numbers don’t lie:
- 10-40x faster load times
- 95% bandwidth reduction
- No CDN reliability concerns
- Minimal storage cost (~500KB)
For content that changes infrequently (every 2 hours), local storage is a no-brainer.
4. Background Jobs Don’t Need External Services
I initially thought I needed a separate cron service. Turns out, node-cron running inside the application is:
- Simpler to deploy (one container instead of two services)
- Easier to monitor (logs in one place)
- More reliable (no network calls between services)
5. Always Include Fallbacks
Notice how I kept enclosureOriginal in the cache? That’s intentional. If image download fails, the app falls back to the CDN URL.
Defense in depth isn’t just for security—it’s for reliability too.
6. Measure Everything
Without metrics, I wouldn’t have known:
- That images were taking 50-200ms
- That I achieved 10-40x improvement
- That resource usage stayed within limits
Measure, optimize, measure again.
Also Read:
When Should You Use This Pattern?
This approach works best when:
✅ Content updates infrequently (hourly, every few hours) ✅ External APIs have rate limits or performance issues ✅ CDN images are slow or unreliable ✅ Resource constraints require efficiency ✅ Operational simplicity matters (fewer moving parts)
Not ideal for:
❌ Real-time data (stock prices, live scores) ❌ User-generated content that needs immediate visibility ❌ Massive image libraries (local storage gets expensive)
Conclusion: From Slow to Lightning Fast
What started as a frustrating performance problem became a masterclass in iteration and optimization.
By combining:
- Background scheduling with
node-cron - File-based persistent caching
- Local image optimization
- Smart cleanup strategies
I transformed a slow, unreliable news page into a lightning-fast, self-contained system that:
- ⚡ Loads images 40x faster
- 📉 Reduced bandwidth by 95%
- 💰 Cut infrastructure costs by 80%
- 🔒 Eliminated external dependencies
The journey taught me that great performance doesn’t require complex infrastructure—sometimes the best solutions are the simplest ones.


