copilot – achraf ben alaya

I Let Five-AI Agents Build My App. Here’s Exactly What Happened.

achraf — Mon, 18 May 2026 17:46:32 +0000

I Let Five AI Agents Build My App. Here’s Exactly What Happened.

We’ve been talking about “AI agents” for over a year now. Every demo is the same: a single model, a long chat, some tool calls. It works. But it doesn’t scale. It doesn’t parallelise. And it doesn’t assign the right brain to the right job.

I’ve been thinking about this problem for a while. Then I watched Burke Holland’s video (https://www.youtube.com/watch?v=-BhfcPseWFQ) and something clicked.

The answer isn’t one smarter agent. It’s Five focused ones.

What Orchestration Actually Means

Here’s where most explanations fall apart: they use jargon that doesn’t connect to anything real. “Orchestration,” “agentic workflows,” “multi-modal pipelines.” Great. What does any of that look like in a real project?

Let me give you a concrete model.

You’re building a feature. Normally, you’d prompt Copilot: “Add a filter bar to the task list.” One model handles everything it plans, writes logic, styles components, checks imports. It does it sequentially, often forgetting context from six steps back.

With orchestration, the same request goes to a coordinator. That coordinator doesn’t write a single line of code. Instead it does three things:

1. Calls a Planner to research your codebase and produce a step-by-step implementation plan, with explicit file assignments

2. Identifies which tasks have no file conflicts those run in parallel

3. Hands scoped tasks to specialist agents a Coder for logic, a Designer for UI with clear ownership

The result: two agents working simultaneously on different files, each doing only what they’re best at. No one steps on anyone’s toes. No context bloat. No mixing concerns.

That’s it. That’s orchestration.

The Ultralight Pattern

Burke Holland built something called Ultralight : https://burkeholland.github.io/ultralight a Five-agent setup for VS Code and GitHub Copilot that makes this concrete:

| Agent | Model | Job |

|-------|-------|-----|

| Orchestrator | Claude Sonnet 4.6 | Coordinates. Never implements. |

| Planner | Claude Opus 4.6 | Researches your codebase. Builds the plan. Assigns files. |

| Coder | GPT-5.3-Codex | Writes production code. Scoped to its file list. |

| Designer | Claude Opus 4.6 | Owns all UI/UX. Colors, spacing, layout, component styling. |

The model choices aren’t random. Claude Opus thinks deeply use it for planning and design decisions that require understanding context. Codex generates code fast and accurately use it for implementation. Sonnet balances speed and intelligence use it to coordinate without wasting compute on a task that’s purely management.

Right model. Right job.

The Mechanic That Makes It Work

Here’s what I find genuinely clever about this pattern: file ownership.

The Planner doesn’t just produce a list of steps. It assigns every file to exactly one agent for each phase. Two agents can work in parallel only if their file lists don’t overlap. The moment they’d need to touch the same file, the Orchestrator makes them sequential.

This solves a real problem. Most “parallel agent” demos don’t account for race conditions. Two agents trying to modify `App.tsx` at the same time produces garbage or worse, one silently overwrites the other’s work.

Ultralight’s answer is strict: one file, one agent, one phase. Period.

I built a demo to make this visible. A React task manager the full code is here : https://github.com/achrafbenalaya/ultralight-agent-orchestration-demo where every `.jsx` file is Coder territory and every `.css` file is Designer territory. They never cross. The comments in each file even say who owns it.

```jsx

// Coder owns: TaskItem.jsx

// Designer owns: TaskItem.module.css

```

That ownership comment is the whole pattern in two lines.

What I Actually Tested

I ran the orchestrator on a real extension request: “Add priority filtering (All / High / Medium / Low) to the task list. Mobile-friendly. Match the existing design system.”

Here’s the rough sequence of what happened:

1. Planner took about 45 seconds It read the existing component structure, identified all files it would need to touch, and returned a two-phase plan. Phase 1: Coder adds filter state to `App.jsx` and filter props to `TaskList.jsx`. Phase 2 (parallel): Coder adds filter buttons to a new `FilterBar.jsx`, Designer styles `FilterBar.module.css` and updates `global.css` with new tokens.

2. Orchestrator identified zero conflicts in Phase 2. `FilterBar.jsx` (Coder) and `FilterBar.module.css` + `global.css` (Designer) separate files. Both spawn.

3. Both agents finished in roughly the same window. The Designer used the existing CSS variable system rather than hardcoding values. The Coder added a controlled filter state with a `useState` that defaults to `”all”`.

4. The integration step (Phase 3) was ten lines. Orchestrator gave it back to Coder for wiring: import `FilterBar`, pass `activeFilter` and `setActiveFilter` as props to `TaskList`.

Total prompt-to-working-feature time: under 4 minutes.

That’s not because the models are faster. It’s because the work ran in parallel and there was zero back-and-forth context repair.

The Part People Get Wrong

Every time I demo this to teams, they ask the same question: “Why not just use one model and have it do everything?”

Two reasons.

First, single-model sequential output doesn’t scale with complexity. Ask one model to redesign a component and refactor its API at the same time. It’ll drop one of those tasks, or conflate them in ways that require manual cleanup. Specialists avoid that.

Second, this is how engineering teams actually work. You don’t ask your backend lead to write the CSS. You don’t ask your UI designer to architect the data layer. Roles exist because specialisation produces better output than generalism at every level of complexity above “Hello World.”

Agents are the same. Give one model too many concerns and it optimises for the average of all of them. Give it one responsibility and it excels.

Setting This Up in Your Own Project

You need VS Code with GitHub Copilot (any tier), plus two settings:

```json

{

  "chat.plugins.enabled": true,

  "chat.subagents.allowInvocationsFromSubagents": true

}

```

Then install the plugin from Burke’s repo directly via the Command Palette:

`⌘⇧P` → Chat: Install Plugin From Source → paste `https://github.com/burkeholland/ultralight`

All Five agents install in one shot.

After that, you just tag the Orchestrator:

“`

@Orchestrator [your feature request]

“`

It calls the Planner first. You don’t need to manage the rest the Orchestrator handles delegation and parallelism automatically.

One thing worth noting: the Coder agent uses [Context7](https://context7.com), an MCP server that fetches live documentation for any library. This matters because GPT-5.3-Codex’s training data is already stale on some APIs. Context7 keeps it honest.

Why This Matters for DevSecOps and Enterprise Teams

I work in enterprise cloud environments Azure,GCP,kube,dev AKS, GitHub Actions, Terraform… The pattern applies there too.

Think about an IaC pipeline change: you need a Bicep module update, a GitHub Actions workflow change, and a documentation update. Normally you’d prompt an agent sequentially for each. With orchestration:

– Planner identifies which files each task owns

– IaC agent handles the Bicep module

– CI/CD agent handles the workflow

– Docs agent updates the README

All three in parallel, scoped to non-overlapping files.

The Planner’s file conflict detection is the safety net. You get parallelism without the chaos of concurrent writes.

What’s Next

Ultralight is intentionally minimal. Five agents, one coordination pattern, zero overhead. That’s the point.

Where I see this going:

More specialised models. We’re already seeing model differentiation reasoning models for planning, fast code-gen models for implementation. The agent definitions just need to be updated as new models drop.

Cross-IDE portability.The `.agent.md` format is VS Code-specific today. Same pattern works anywhere that supports subagent invocation.

Security and compliance layers. For enterprise, add a fifth agent a Security Reviewer that runs after Coder and before the Orchestrator closes the task. It scans changed files for secrets, insecure patterns, and policy violations. The Planner assigns it to run sequentially after all coding phases (I will be adding a new agent in another post).

The Code

Everything I described is in this repo: all five agent definition files, the full demo React app with proper file ownership separation, VS Code settings, and the Context7 MCP configuration.

Clone it, install the agents, drop a feature request to `@Orchestrator`, and watch five models work together.

```bash

git clone https://github.com/achrafbenalaya/copilot-Orchestratio

cd copilot-Orchestratio

npm install && npm run dev

```

The demo app is live at `localhost:3000`. The agents are ready to extend it.

Questions or feedback? Find me on LinkedIn : https://www.linkedin.com/in/achrafbenalaya/ or YouTube: https://www.youtube.com/c/AchrafBenAlaya. I post regularly about Azure, DevSecOps,copilot,(soon gcp too), and AI integration in enterprise environments.

References:

– Ultralight Orchestration Burke Holland : https://gist.github.com/burkeholland/0e68481f96e94bbb98134fa6efd00436
– Ultralight Official Site : https://burkeholland.github.io/ultralight/
– Repo github : achrafbenalaya/copilot-Orchestration

GitHub Copilot Skills for Terraform: 5 On-Demand AI Assistants for Azure Container Apps

achraf — Sun, 29 Mar 2026 13:37:29 +0000

Teaching Copilot to Know Your Stack: GitHub Copilot Skills for Azure Container Apps Part 4

In Part 3, we gave GitHub Copilot a project identity. Custom instructions told it about our Zero Trust rules. Path-specific instructions scoped guidance to the right files. Prompt files turned repetitive tasks into one-click workflows. Custom agents gave it specialized personas with scoped tools.

But all of that is always on. Every Copilot conversation loads the custom instructions, whether you’re asking about networking or asking it to write a commit message. That’s fine when the instructions are small. It becomes a problem when your project grows when you accumulate rules for security reviews, cost analysis, state management, scaling, and image lifecycle. You can’t fit everything into `copilot-instructions.md` without turning it into a sprawling document that confuses the model as much as it helps it.

Skills solve this. They’re the on-demand counterpart to always-on instructions. A skill is a folder with a `SKILL.md` file that Copilot loads *only when your prompt is relevant to it. Ask about drift? The drift detector skill loads. Ask about costs? The cost estimator loads. Ask about a new Container App resource? Copilot uses the base instructions and leaves the cost estimator alone.

This is Part 4, and it’s entirely about skills.

What skills actually are

The mental model is straightforward. Custom instructions are rules you want Copilot to follow always your naming conventions, your provider version, your security posture. Skills are specialized knowledge packages you want available on demand the deep expertise for a specific task that would clutter the always-on context if it were always loaded.

Mechanically, a skill is a directory under `.github/skills/` with a single required file: `SKILL.md`. That file has two parts: a YAML frontmatter block and a Markdown body.

```
.github/
└── skills/
    └── my-skill-name/
        └── SKILL.md
```

The frontmatter defines the skill's identity:

```yaml
---
name: my-skill-name
description: >
  One sentence summary.

  When to use this skill:
  - "trigger phrase 1"
  - "trigger phrase 2"
---
```

The description field is doing a lot of work here. Copilot reads it to decide whether this skill is relevant to your current prompt. The “when to use” section isn’t documentation for humans it’s signal for the model. Write it as a list of phrases someone would actually type when they need this skill. The more specific and realistic, the better the match.

The Markdown body is the skill’s actual content: instructions, workflows, code examples, lookup tables, templates. Whatever Copilot needs to execute the task well.

Skills vs. the other tools in the toolbox

After three parts covering custom instructions, path-specific instructions, prompt files, and agents, it’s worth being precise about where skills fit.

Tool	Location	When loaded	Best for
Custom instructions	`.github/copilot-instructions.md`	Every conversation	Universal rules (naming, security, provider version)
Path-specific instructions	`.github/instructions/*.instructions.md`	When editing matching files	File-type-specific guidance
Prompt files	`.github/prompts/*.prompt.md`	When you select them manually	Repeatable multi-step tasks
Custom agents	`.github/agents/*.agent.md`	When you select the agent	Specialized personas with tool access
Skills	*`.github/skills//SKILL.md`**	When prompt matches description	Deep expertise for specific tasks

The key distinction from agents: agents are personas you *select*. Skills are knowledge packages that Copilot *discovers*. When you assign an issue to the coding agent with the implementation agent selected, that’s an explicit choice. When you ask “how much does this architecture cost?” and the cost estimator skill loads, that’s automatic.

Building five skills for this project

Let’s build a complete skill library for the Azure Container Apps infrastructure we’ve been assembling across Parts 1, 2, and 3. Each skill addresses a real operational need.

Skill 1 Terraform Drift Detector :

Create `.github/skills/terraform-drift-detector/SKILL.md`:

---
name: terraform-drift-detector
description: >
  Detect, explain, and resolve Terraform state drift in this Azure Container Apps project.

  When to use this skill:
  - "Why does my terraform plan show unexpected changes?"
  - "Something changed in Azure but I didn't touch the Terraform"
  - "Detect drift", "check for drift", "why is plan not clean?"
  - "Azure Portal changes not reflected in state"
  - After manual changes via Azure CLI or Portal that weren't tracked in state
---

# Terraform Drift Detector

You are an expert in Terraform state management for Azure Container Apps infrastructure.
Your job is to detect, explain, and resolve drift between the Terraform state and actual Azure resources.

## What Is Drift

Drift happens when real Azure resources no longer match what Terraform's state file says they should be.
Common causes in this project:
- Manual changes via Azure Portal or CLI (e.g., scaling a Container App by hand)
- Azure auto-healing or auto-upgrading resources (e.g., Container App revision updates)
- Expiry or auto-rotation of identities
- Out-of-band certificate renewals on Application Gateway

## Detection Workflow

1. Read the current state using `#readFile` on `terraform.tfstate`
2. Run `terraform plan -detailed-exitcode`
   - Exit code 0 = no drift
   - Exit code 2 = drift found
3. Categorize each change by severity:
   - `external_enabled` flip on any Container App → 🔴 CRITICAL
   - `admin_enabled` change on ACR → 🔴 CRITICAL
   - Scaling values (min/max replicas) → 🟡 MEDIUM
   - Tag or label drifts → 🟢 LOW

## Reconciliation Rules

- NEVER auto-run `terraform apply` — show the plan first, ask for confirmation
- For CRITICAL: surface clearly, explain what probably happened, recommend remediation steps
- For LOW/MEDIUM: propose `terraform apply -target=<resource>` with the specific address

## Output Format

Present findings as a Drift Report with Critical Changes and Safe-to-Reconcile sections.

Why this skill matters. In a real team, someone will make a manual change in the Azure Portal usually in an incident, under pressure. Terraform will then want to revert it. The worst case is a terraform apply that flips external_enabled from true to false on the frontend, taking it offline. The drift detector loads the right context to catch that before it happens.

Skill 2 ACA Scaling Advisor

Create `.github/skills/aca-scaling-advisor/SKILL.md`:

---
name: aca-scaling-advisor
description: >
  Design, review, and optimize scaling rules for Azure Container Apps in this project.

  When to use this skill:
  - "Add scaling rules to my Container App"
  - "How should I scale the backend API?"
  - "My app is slow under load", "optimize scaling", "autoscale"
  - "Add KEDA scaler", "HTTP scaling", "queue-based scaling"
  - "min_replicas is too high", "I'm paying too much for idle containers"
---

# ACA Scaling Advisor

You are an expert in Azure Container Apps autoscaling using KEDA.
Architecture context: frontend is external-facing via Application Gateway,
backend is internal-only. Both run on Consumption workload profile.

## Key Rules

**Frontend** — safe for `min_replicas = 0`. Application Gateway handles the queue.
Use HTTP scaler with `concurrentRequests = 100`.

**Backend** — keep `min_replicas = 1` unless the frontend uses async calls.
Scale-to-zero on a synchronously-called backend means the frontend sees cold start latency.
Use CPU scaler at 70% utilization.

## HTTP Scaling (Frontend)

\`\`\`hcl
template {
  min_replicas = 0
  max_replicas = 10

  custom_scale_rule {
    name             = "http-scaler"
    custom_rule_type = "http"
    metadata = {
      concurrentRequests = "100"
    }
  }
}
\`\`\`

## CPU Scaling (Backend)

\`\`\`hcl
template {
  min_replicas = 1
  max_replicas = 5

  custom_scale_rule {
    name             = "cpu-scaler"
    custom_rule_type = "cpu"
    metadata = {
      type  = "Utilization"
      value = "70"
    }
  }
}
\`\`\`

Always validate: no conflicting scale rules, `max_replicas` fits your cost ceiling,
and `min_replicas ≥ 1` on synchronously-called services.

The scaling advisor encodes the architectural constraints of this specific project. A generic Copilot response might suggest min_replicas = 0 on the backend because it reduces costs. That’s correct in isolation. It’s wrong here because the frontend calls the backend synchronously a cold start becomes frontend latency. The skill carries that context.

Skill 3 Azure Cost Estimator

Create .github/skills/azure-cost-estimator/SKILL.md:

---
name: azure-cost-estimator
description: >
  Estimate and break down monthly Azure costs for this Container Apps infrastructure.

  When to use this skill:
  - "How much does this architecture cost?"
  - "Estimate my Azure bill", "cost breakdown", "cost estimate"
  - "Is the Application Gateway expensive?"
  - "I want to reduce costs", "cheapest way to run this"
  - "What's the difference in cost between Consumption and Dedicated?"
  - Before adding new resources — estimate the cost impact first
---

# Azure Cost Estimator

Pricing reference for West Europe (adjust by ~10-20% for other regions):

| Resource | Config | Est. Monthly Cost |
|---|---|---|
| Application Gateway | Standard_v2 | ~$185 |
| Container Registry | Standard SKU | ~$20 |
| Container Apps (per app) | 0.25 vCPU, 0.5GiB, 8h/day | ~$15 |
| Public IP | Standard | ~$4 |
| Private DNS Zone | 1 zone | ~$1 |
| Log Analytics | <5GB/day ingestion | ~$0 |

**Important:** Application Gateway accounts for ~75% of the bill at this scale.
It costs ~$185/month regardless of traffic volume.

When asked for an estimate:
1. Read `.tf` files to identify all provisioned resources
2. Ask for region if not West Europe
3. Present the cost table with actual resource configs from Terraform
4. Highlight the biggest cost driver
5. Suggest one or two project-specific optimizations
6. Always point to the Azure Pricing Calculator for accurate quotes

Every project eventually hits the “what’s this costing us?” question. Without the skill, Copilot would give a generic answer that doesn’t account for the Application Gateway’s flat cost, the specific SKUs we chose, or the Consumption billing model. The skill bakes those numbers in.

Skill 4 Security Posture Reviewer

Create .github/skills/security-posture-reviewer/SKILL.md:

---
name: security-posture-reviewer
description: >
  Review the Zero Trust security posture of this Azure Container Apps Terraform project.

  When to use this skill:
  - "Review my security", "security audit", "security check"
  - "Is this Zero Trust?", "what are my security gaps?"
  - "Check my NSG rules", "are my secrets safe?"
  - Before a production deployment or architecture review
  - After adding new resources — verify they follow the project's security rules
---

# Security Posture Reviewer

Review the project's Zero Trust compliance against this checklist.
Output ✅ or ❌ for each item after reading the Terraform files.

### Networking
- [ ] `internal_load_balancer_enabled = true` on Container App Environment
- [ ] ACA subnet CIDR is minimum `/23`
- [ ] NSG includes `GatewayManager` and `AzureLoadBalancer` inbound rules on AppGW subnet
- [ ] No `0.0.0.0/0` allow-all inbound rules (except those required by Azure platform)
- [ ] Private DNS zone linked to VNet with `registration_enabled = false`

### Container Apps
- [ ] Backend uses `external_enabled = false` (not just an NSG rule)
- [ ] No plaintext secrets in environment variables

### Container Registry
- [ ] `admin_enabled = false`
- [ ] SKU is Standard or Premium

### Identity and Credentials
- [ ] System-assigned Managed Identity enabled on both Container Apps
- [ ] `AcrPull` role assigned for each Container App's principal_id
- [ ] GitHub Actions uses workload identity federation (not service principal secrets)
- [ ] `terraform.tfstate` is NOT committed to the repository

## Common Gaps to Flag

**State file in repo**  contains resource IDs and output values. Add to `.gitignore` and use Azure Storage backend.

**Missing WAF policy**  Standard_v2 supports WAF but it's not enabled by default. Without it, no OWASP rule set protects the frontend from injection attacks.

**Log Analytics retention at default 30 days**  insufficient for incident response. Set `retention_in_days = 90`.

This skill is the automated version of the senior engineer’s pre-deployment checklist. It doesn’t just know generic security best practices it knows this project’s security model and can check it against the actual Terraform files.

Skill 5 ACR Image Manager

Create .github/skills/acr-image-manager/SKILL.md:

---
name: acr-image-manager
description: >
  Manage container images in Azure Container Registry — tagging strategy, image promotion,
  cleanup, and updating Container App image references in Terraform.

  When to use this skill:
  - "Tag my image for production", "promote image from staging to prod"
  - "Clean up old images in ACR", "delete untagged manifests"
  - "Update the Container App to use image version X"
  - "How should I tag my Docker images?", "image versioning strategy"
  - "Purge images older than 30 days", "reduce ACR storage costs"
---

# ACR Image Manager

ACR was created with `admin_enabled = false`. All operations use RBAC, not admin credentials.
Always authenticate with `az acr login --name ` using the user's Azure CLI identity.

## Tagging Strategy

Use semantic versioning with a build reference. Never rely on `latest` alone for production.

\`\`\`
/:-
\`\`\`

Examples:
- `acrab3k2m.azurecr.io/frontend:1.2.0-a3f9c1e`  ← immutable, for rollback
- `acrab3k2m.azurecr.io/frontend:latest`           ← mutable, for convenience

## Image Promotion (Staging → Production)

Use `az acr import` to copy between registries without pulling locally:

\`\`\`bash
az acr import \
  --name  \
  --source /backend:1.2.0-a3f9c1e \
  --image backend:prod-1.2.0 \
  --registry 
\`\`\`

## Updating Container App Image in Terraform

After promoting, update the `image` field in `aca.tf`:

\`\`\`hcl
image = "${azurerm_container_registry.acr.login_server}/backend:1.2.0-a3f9c1e"
\`\`\`

Then run `terraform plan` — only the image tag should change. Azure Container Apps creates a new revision automatically.

## Cleanup

Always dry-run before executing:

\`\`\`bash
az acr run \
  --registry  \
  --cmd "acr purge --filter 'backend:.*' --untagged --ago 30d --dry-run" \
  /dev/null
\`\`\`

Remove `--dry-run` once the output looks right.

The image management skill is particularly useful after the GitHub Actions CI/CD pipeline from Part 3 starts pushing images. Without it, Copilot doesn’t know whether to suggest az acr commands, direct Docker commands, or Terraform changes. The skill normalizes that: use RBAC auth, use the ACR import command for promotion, update the specific image field in aca.tf.

How Copilot discovers and loads skills

You don’t invoke skills manually. When you type a prompt in Copilot Chat (or an issue body that gets routed to the coding agent), Copilot reads the description field of every skill in .github/skills/ and decides which ones are relevant. If the match is strong, the SKILL.md content is injected into the context for that conversation.

This means the description field is actually the most important part of the file. Write it as if you’re writing the queries that should trigger it. The more concrete and realistic, the better.

A few things that improve match quality:

Be specific about trigger phrases. “Detect drift”, “check for drift”, and “why is plan not clean?” are all things a real engineer would type. Generic phrases like “help with infrastructure” are too broad — they’d match every skill and load them all.

Include anti-examples if needed. If two skills might get confused (say, cost estimator and scaling advisor both relate to “spending less money”), mention what each one doesn’t cover in the description.

Keep the body focused. A skill loaded into context costs tokens. If the body is bloated with tangential information, the model’s attention dilutes. Each skill should do one thing well.

Your repo structure after Part 4

.github/
├── copilot-instructions.md            # Always-on: naming, provider, Zero Trust rules
├── agents/
│   ├── terraform-aca-implement.agent.md  # Specialist: writes and validates Terraform
│   └── terraform-aca-planning.agent.md   # Specialist: designs changes, creates plans
├── instructions/
│   ├── networking.instructions.md     # File-scoped: vnet.tf, nsg.tf, dns.tf
│   └── containers.instructions.md    # File-scoped: aca.tf, acr.tf
├── prompts/
│   ├── new-container-app.prompt.md   # One-click: scaffold a new Container App
│   └── terraform-review.prompt.md   # One-click: security review checklist
├── skills/
│   ├── terraform-drift-detector/
│   │   └── SKILL.md                 # On-demand: detect and resolve state drift
│   ├── aca-scaling-advisor/
│   │   └── SKILL.md                 # On-demand: design KEDA scaling rules
│   ├── azure-cost-estimator/
│   │   └── SKILL.md                 # On-demand: monthly cost breakdown
│   ├── security-posture-reviewer/
│   │   └── SKILL.md                 # On-demand: Zero Trust compliance check
│   └── acr-image-manager/
│       └── SKILL.md                 # On-demand: image tagging, promotion, cleanup
└── workflows/
    └── terraform.yml                # CI/CD: plan on PR, apply on merge

Each layer serves a different purpose. Always-on instructions keep every AI interaction aligned with your project’s conventions. Path-specific instructions add depth when editing specific file types. Prompt files make repetitive tasks one-click. Agents give you specialized personas. Skills provide deep expertise exactly when and only when you need it.

Where to find more skills

GitHub maintains the github/awesome-copilot repository with 247+ community-contributed skills. For Azure infrastructure work specifically, look at azure-architecture-autopilot (design and deploy Azure resources from natural language) and create-specification (generate structured spec files optimized for AI consumption). These are production-grade starting points — fork them, trim what you don’t need, add your project’s specific context.

One thing worth knowing: skills in awesome-copilot are organized by domain rather than by project. They’re designed to be generally useful, not specifically aware of your infrastructure. The value you get from writing your own is exactly that specificity — the drift detector that knows the difference between a MEDIUM and CRITICAL change for this architecture, the scaling advisor that knows this backend is called synchronously, the cost estimator that already has these SKUs and region prices loaded.

The full picture

Three parts built an infrastructure platform. Part 4 built the AI layer on top of it.

Start with what’s always relevant: custom instructions for project-wide rules that apply to every conversation. Add path-specific instructions for file types that have specialized concerns. Create prompt files for the tasks your team runs repeatedly. Build agents for the workflows that need specialized personas and specific tool access. Then write skills for the deep expertise that’s only needed sometimes state drift, scaling design, cost analysis, security review, image lifecycle.

None of these are about making Copilot smarter in the abstract. They’re about making it useful for your specific project, in the way that a colleague who’s worked on the codebase for six months is useful not because they know more general programming knowledge than a new hire, but because they know what matters here.

This is Part 4 of a series on building production-ready microservices on Azure Container Apps with Terraform and GitHub Copilot. Part1 covers the secure networking baseline. Part 2 covers ACR, internal backend, and frontend-backend wiring. Part 3 covers AI-assisted automation, Managed Identities, and CI/CD.

From Terraform to Autopilot: AI-Assisted Automation for Azure Container Apps Part 3

achraf — Sun, 08 Mar 2026 23:24:43 +0000

Intro

Most infrastructure tutorials end where Part 2 left off you have working Terraform, a microservices architecture, and a mental model of how the pieces fit together. Then reality hits. Someone on the team pushes a Terraform change without running `terraform validate`. A new engineer copies an old module and hardcodes a subnet CIDR. The backend Container App gets deployed with `external_enabled = true` because someone missed it in review.

Part 3 is about preventing those mistakes before they happen, and automating everything else so you don’t have to think about it.

We’re going to wire up three things that don’t usually appear in the same blog post: GitHub Copilot custom instructions (so AI understands your infrastructure conventions), GitHub Actions pipelines (so deployments are repeatable), and Managed Identities (so there are zero credentials in your codebase). By the end, your repo will be a self-documenting, self-deploying, self-securing system.

Where we left off

Quick recap. In Part 1, we built the networking foundation: a VNet, NSGs, an Application Gateway, and a single frontend Container App behind an internal load balancer. In Part 2, we expanded to microservices: an Azure Container Registry with admin access disabled, an internal-only backend Container App (`external_enabled = false`), and frontend-to-backend wiring through an environment variable.

The architecture works. But it has three gaps that would make any senior engineer uncomfortable in production.

First, the Container Apps are still pulling from Microsoft’s public registry (`mcr.microsoft.com`). Our ACR exists but nothing pushes to it and nothing pulls from it. Second, there are no credentials connecting ACR to the Container Apps because we disabled admin access (correctly), but haven’t set up the alternative yet. Third, deployments are manual. Every change requires someone to run `terraform plan` and `terraform apply` from their laptop.

Part 3 closes all three gaps.

Teaching your AI pair-programmer to think in Terraform

Before writing a single line of automation code, we’re going to do something that pays compounding dividends: teach GitHub Copilot how your project works.

If you’ve used Copilot in a Terraform project, you’ve probably noticed it generates syntactically correct HCL that’s architecturally wrong. It’ll suggest `admin_enabled = true` on a container registry because that’s what most tutorials do. It’ll hardcode resource group names instead of using references. It’ll create subnets without delegations because it doesn’t know you’re deploying Container Apps.

Custom instructions fix this by giving Copilot persistent context about your project’s rules and conventions.

The main instructions file

Create `.github/copilot-instructions.md` in your repo root. This file is automatically loaded by Copilot Chat in VS Code, Visual Studio, and JetBrains every time, for every conversation. No slash commands, no manual attachment.

Here’s what ours looks like for this project:

“`markdown

Project Context

This is a 3-part Azure Container Apps infrastructure project using Terraform.

The architecture follows Zero Trust principles with internal-only networking.

Architecture Rules (ALWAYS follow these)

– Container App Environment uses internal load balancer (`internal_load_balancer_enabled = true`)

– Backend Container Apps MUST use `external_enabled = false` in their ingress block

– Only the frontend Container App may use `external_enabled = true`

– All traffic from the internet enters through the Application Gateway never directly to a Container App

– Azure Container Registry MUST have `admin_enabled = false` use Managed Identity with AcrPull role instead

– Container images are referenced via `azurerm_container_registry.acr.login_server` never hardcode registry URLs

Terraform Code Style

– Provider: `azurerm ~> 4.33.0` (do NOT suggest older versions)

– Use resource references (e.g., `azurerm_resource_group.rg.name`) never hardcode names

– Use `random_string.suffix.result` for globally unique resource names

– Every resource must include `resource_group_name` and `location` from `azurerm_resource_group.rg`

– Avoid deprecated arguments check the latest azurerm provider docs

– Use `depends_on` only when Terraform cannot infer the dependency graph automatically

Naming Conventions

– Resource groups: `rg-`

– VNets: `vnet-`

– Subnets: `snet-`

– NSGs: `nsg-`

– Container Apps: `ca-`

– Container App Environment: `cae-`

Security Requirements

– No static credentials anywhere in the codebase

– ACR access via Managed Identity only (AcrPull role)

– Secrets go in Azure Key Vault never in Terraform variables or environment variables

– NSG rules follow least-privilege: allow only the specific ports and CIDRs needed

“`

This file is roughly 40 lines of Markdown. It takes five minutes to write. But now, every time someone on your team asks Copilot to “add a new Container App for the payment service,” the generated code will use internal ingress, reference the existing Container App Environment, follow your naming conventions, and avoid admin credentials.

The key insight here is that custom instructions aren’t about making Copilot smarter they’re about making it *contextual*. Copilot already knows Terraform syntax. It doesn’t know your project’s rules.

Path-specific instructions for different concerns

The main instructions file covers project-wide rules. But Terraform projects have different zones of concern networking files need different guidance than application files.

Create `.github/instructions/` and add focused instruction files:

`.github/instructions/networking.instructions.md`:

“`markdown

—

applyTo: “**/vnet.tf,**/nsg.tf,**/dns.tf”

—

When working on networking files:

– Subnets for Container Apps MUST include a delegation block for `Microsoft.App/environments`

– The ACA subnet requires a minimum /23 CIDR range

– NSG rules: always include GatewayManager and AzureLoadBalancer inbound rules on the AppGW subnet

– Private DNS zones must be linked to the VNet with `registration_enabled = false`

“`

`.github/instructions/containers.instructions.md`:

“`markdown

—

applyTo: “**/aca.tf,**/acr.tf”

—

When working on container resources:

– Container Apps use `workload_profile_name = “Consumption”` unless dedicated compute is needed

– Backend services: `external_enabled = false`, `target_port = 80`, `transport = “auto”`

– Frontend services: inject backend URLs via `env` blocks, using the pattern `http://.ingress[0].fqdn`

– ACR: Standard SKU minimum. Never enable admin access. Future: connect via `registry` block with `identity = “System”`

– Use `min_replicas = 1` for always-on services, `min_replicas = 0` for event-driven scale-to-zero

“`

The `applyTo` glob pattern is the magic Copilot only loads these instructions when you’re editing files that match. So when you’re in `nsg.tf`, you get networking-specific guidance. When you’re in `aca.tf`, you get container-specific guidance. No noise, no irrelevant suggestions.

Reusable prompt files for common tasks

Your team probably does the same Terraform tasks repeatedly: adding a new Container App, creating a new subnet, writing an output block. Prompt files turn these into one-click workflows.

Create `.github/prompts/` and add prompt files for your most common operations.

`.github/prompts/new-container-app.prompt.md`:

“`markdown

—

description: ‘Scaffold a new Azure Container App following project conventions’

—

Create a new `azurerm_container_app` resource with these requirements:

1. Use the existing Container App Environment: `azurerm_container_app_environment.env`

2. Resource group and location from `azurerm_resource_group.rg`

3. Naming: `ca-` (ask me for the service name)

4. Workload profile: Consumption

5. Ingress: ask whether this is a backend (internal) or frontend (external) service

6. If backend: `external_enabled = false`, explain that this is only reachable within the CAE

7. If frontend: add `env` block for BACKEND_API_URL pointing to the backend’s internal FQDN

8. Template: placeholder image from MCR, cpu 0.25, memory 0.5Gi

9. Add a corresponding output for the app’s FQDN in main.tf

Follow the naming conventions and security rules in copilot-instructions.md.

“`

`.github/prompts/terraform-review.prompt.md`:

“`markdown

—

description: ‘Review Terraform code for security and best practice violations’

—

Review the selected Terraform code and check for:

1. Security: Any `admin_enabled = true` on registries? Any `external_enabled = true` on backend services? Any hardcoded credentials or secrets?

2. References: Are all resource names, locations, and resource groups using Terraform references (not hardcoded strings)?

3. Deprecated arguments: Flag any arguments deprecated in azurerm ~> 4.33.0

4.Naming conventions: Do resource names follow the `ca-`, `snet-`, `nsg-`, `rg-` patterns?

5. Dependencies: Are `depends_on` blocks only used where Terraform can’t infer the dependency?

6. Ingress rules: Is every backend Container App using `external_enabled = false`?

Output findings as a checklist with or for each item.

“`

In VS Code, you invoke these by typing `/` in Copilot Chat and selecting the prompt name. The prompt file becomes the instruction, and Copilot executes it in the context of your current workspace. It’s like having a senior engineer’s code review checklist that actually runs itself.

Custom agents: giving Copilot a Terraform specialty

Custom instructions and prompt files make Copilot *aware* of your project. Custom agents take this further they create specialized personas with specific tools, focused expertise, and scoped permissions. Instead of one generalist Copilot, you can have a Terraform implementation agent, a planning agent, and a security review agent, each with its own toolset and behavioral rules.

What custom agents are

A custom agent is a Markdown file with YAML frontmatter, stored in `.github/agents/`. The frontmatter defines the agent’s name, description, and which tools it can access. The Markdown body contains the agent’s system instructions its expertise, rules, and workflow. When you select a custom agent in Copilot Chat or assign it to an issue, the coding agent loads these instructions and operates as that specialized persona.

The file format is simple:

“`markdown

—

description: “What this agent does”

tools: [list, of, tools]

—

Agent Instructions

Your behavioral rules, expertise, and workflow go here.

“`

The `tools` property is the interesting part. You can restrict an agent to read-only tools (for planning and review agents that shouldn’t modify code), or give it full access to edit files, run terminal commands, and fetch documentation. If you omit `tools` entirely, the agent gets access to everything.

Building a Terraform implementation agent for this project

Let’s build a custom agent tailored to our Azure Container Apps infrastructure. Create `.github/agents/terraform-aca-implement.agent.md`:

“`markdown

—

description: “Creates and reviews Terraform for Azure Container Apps following project conventions, Zero Trust networking, and Managed Identity patterns.”

tools: [execute/getTerminalOutput, execute/runInTerminal, read/readFile, read/terminalLastCommand, edit/createFile, edit/editFiles, search, web/fetch, todo]

—

Azure Container Apps Terraform Implementation Specialist

You are an expert in Azure Container Apps infrastructure using Terraform.

This project follows Zero Trust principles with internal-only networking.

Architecture rules (ALWAYS follow these)

– Container App Environment uses internal load balancer (`internal_load_balancer_enabled = true`)

– Backend Container Apps MUST use `external_enabled = false` in their ingress block

– Only the frontend Container App may use `external_enabled = true`

– All internet traffic enters through the Application Gateway never directly to a Container App

– Azure Container Registry MUST have `admin_enabled = false` use Managed Identity with AcrPull role

– Container images are referenced via `azurerm_container_registry.acr.login_server` never hardcode registry URLs

– No static credentials anywhere ACR access via system-assigned Managed Identity only

Workflow

1. Review existing `.tf` files using `#search` before making changes

2. Write Terraform configurations using `#editFiles`

3. Break the user’s request into actionable items using `#todos`

4. After creating or editing files, run: `terraform fmt`, `terraform validate`

5. Offer to run `terraform plan` but NEVER run it without explicit user confirmation

6. Prefer implicit dependencies over explicit `depends_on`

7. Remove dead code: unused variables, locals, and outputs

Naming conventions

– Resource groups: `rg-`

– VNets: `vnet-`, Subnets: `snet-`

– NSGs: `nsg-`

– Container Apps: `ca-`

– Container App Environment: `cae-`

Final checklist

– All resource names follow naming conventions and include appropriate tags

– No secrets or environment-specific values hardcoded

– AcrPull role assignments exist for every Container App with a system-assigned identity

– Backend services use `external_enabled = false`

– Provider version: `azurerm ~> 4.33.0`

– Generated Terraform validates cleanly and passes format checks

“`

This agent encodes everything from our `copilot-instructions.md` plus implementation-specific behavior: it runs `terraform fmt` and `validate` after every edit, it tracks work with todos, it refuses to run `terraform plan` without asking first, and it checks for dead code. The `tools` list gives it file editing, terminal execution, and search but not destructive operations.

Pairing it with a planning agent

For larger changes adding a new microservice, refactoring the networking layer you want a separate agent that plans *before* anyone writes code. Create `.github/agents/terraform-aca-planning.agent.md`:

“`markdown

—

description: “Creates implementation plans for Azure Container Apps infrastructure changes. Read-only does not modify Terraform files.”

tools: [read/readFile, search, web/fetch, edit/createFile, edit/editFiles, todo]

—

Azure Container Apps Infrastructure Planner

You create implementation plans for Terraform changes. You do NOT write Terraform code.

Workflow

1. Review existing `.tf` files and understand the current architecture

2. Research Azure resource requirements using `#fetch` against Microsoft docs

3. Write a structured plan to `.terraform-planning-files/INFRA.{goal}.md`

4. The plan must list every resource, its dependencies, required variables, and outputs

5. Break implementation into phased tasks with clear acceptance criteria

Plan structure

Each plan includes: an introduction summarizing the change, a resources section with YAML blocks defining each Azure resource (kind, module/provider, variables, outputs, dependencies), and phased implementation tasks with specific file-level actions.

Constraints

– Only create or modify files under `.terraform-planning-files/`

– Do NOT modify `.tf` files that’s the implementation agent’s job

– Always consult Microsoft docs for resource configurations

– Flag any changes that would affect networking (CIDR ranges, NSG rules) as requiring human review

“`

The workflow becomes: assign a planning issue to the planning agent, review the generated plan, then assign the implementation issue to the implementation agent with a reference to the plan. The implementation agent reads the plan from `.terraform-planning-files/` and executes it.

Using agents with the coding agent

When you assign a GitHub issue to Copilot, you can select which custom agent handles it from a dropdown. The coding agent spins up an ephemeral GitHub Actions environment, loads the selected agent’s instructions and tools, reads your custom instructions and prompt files, makes changes, and opens a pull request.

For example, say your team needs a new internal microservice for notifications. You create an issue:

Issue title: Add internal Container App for notification service

Issue body:

“`

Create a new Container App `ca-notification` in the existing Container App Environment.

This is a backend service it should only be reachable internally.

Use the placeholder nginx image from MCR for now.

CPU: 0.25, Memory: 0.5Gi, min replicas: 1.

Add an output for the notification service FQDN.

“`

You assign the issue to Copilot and select the ACA Terraform Implementation agent. The agent creates a `copilot/issue-42` branch, writes the resource in `aca.tf` with internal ingress, adds the identity block and `AcrPull` role assignment, runs `terraform fmt` and `terraform validate`, self-reviews its changes using Copilot code review, and opens a pull request. If you leave a comment like “@copilot also add an env block so the frontend can reach this service,” it picks up the feedback, pushes a new commit, and re-requests review.

Your CI pipeline runs `terraform plan` on the PR, so you can see exactly what the agent’s code would do to your infrastructure before approving.

Where to find more agents

GitHub maintains a curated collection of community agents at [github/awesome-copilot](https://github.com/github/awesome-copilot/tree/main/agents) over 170 agent profiles covering everything from Azure infrastructure to security scanning, database administration, and code review. For Terraform specifically, look at `terraform.agent.md`, `terraform-azure-implement.agent.md`, `terraform-azure-planning.agent.md`, and `terraform-iac-reviewer.agent.md`. These are production-grade starting points that you can fork and customize for your project’s conventions.

The tasks where you still want a human: anything involving networking changes (CIDR ranges, NSG rules), provider version upgrades, or state-sensitive operations like resource renames. The agent doesn’t have access to your Terraform state, so it can’t predict plan output.

Managed Identities: killing the last static credential

In Part 2, we created an ACR with `admin_enabled = false` and left a comment saying “connect via Managed Identity in Part 3.” Here’s that connection.

The idea is simple: instead of giving Container Apps a username and password to pull images from ACR, we give them a Microsoft Entra ID identity and assign it the `AcrPull` role. The identity is managed by Azure it’s created, rotated, and destroyed automatically. There’s nothing to store, nothing to leak.

Here’s the Terraform to add. First, enable system-assigned managed identity on both Container Apps by adding an `identity` block:

“`hcl

identity {

type = “SystemAssigned”

}

“`

Then create role assignments that grant each Container App’s identity the `AcrPull` permission on the ACR:

“`hcl

resource “azurerm_role_assignment” “frontend_acr_pull” {

scope = azurerm_container_registry.acr.id

role_definition_name = “AcrPull”

principal_id = azurerm_container_app.app.identity[0].principal_id

}

resource “azurerm_role_assignment” “backend_acr_pull” {

scope = azurerm_container_registry.acr.id

role_definition_name = “AcrPull”

principal_id = azurerm_container_app.backend.identity[0].principal_id

}

“`

Finally, update the Container App definitions to use the `registry` block instead of pulling from a public registry:

“`hcl

registry {

server = azurerm_container_registry.acr.login_server

identity = “System”

}

“`

The `identity = “System”` parameter tells Azure Container Apps to authenticate to the ACR using its own system-assigned identity. No passwords, no tokens, no environment variables. The authentication is handled entirely within the Azure control plane.

This also means your Container Apps will fail to start if the role assignment is missing or wrong which is exactly the behavior you want. Fail closed, not open. If someone accidentally removes the `AcrPull` assignment, the app doesn’t silently fall back to anonymous access it refuses to pull the image.

GitHub Actions: from `git push` to production

The final piece is a CI/CD pipeline that runs `terraform plan` on pull requests and `terraform apply` on merge to main. We’re using OIDC (OpenID Connect) to authenticate GitHub Actions to Azure no service principal secrets stored in GitHub.

Setting up workload identity federation

This is Microsoft’s recommended approach for authenticating external workloads like a GitHub Actions runner to Azure without storing any secrets. Microsoft calls it workload identity federation, and it’s built on top of OpenID Connect (OIDC).

Here’s what’s happening under the hood. GitHub Actions has a built-in OIDC provider at `https://token.actions.githubusercontent.com`. Every time a workflow run needs to authenticate, GitHub issues a short-lived JSON Web Token (JWT) that contains claims about the workflow which repository triggered it, which branch, which environment. That token is valid for minutes, not months.

On the Azure side, you create an app registration(or a user-assigned managed identity) in Microsoft Entra ID (formerly Azure AD) and add a federated identity credential to it. This credential tells Entra ID: “trust tokens from GitHub’s OIDC provider, but only when the subject claim matches this specific repository and branch.” The audience is set to `api://AzureADTokenExchange`.

At runtime, the `azure/login` action in your workflow exchanges the GitHub-issued JWT for a short-lived Azure access token via the Microsoft identity platform. The access token is scoped to your subscription and expires quickly. There’s no service principal secret, no client certificate, nothing to rotate or leak.

You’ll need to set up three things:

1. App registration in Entra ID go to the Microsoft Entra admin center → App registrations → New registration. This creates the identity your GitHub workflow will authenticate as. Assign it a role (like `Contributor`) on your subscription or resource group.

2. Federated identity credential on the app registration, go to Certificates & secrets → Federated credentials → Add credential. Select “GitHub Actions” as the scenario. Set the organization, repository, and entity type (branch, environment, or tag). For production deployments, use the `environment` entity type pointed at a GitHub Environment with required reviewers this is more secure than branch-based subjects because it ensures a human approved the deployment.

3. GitHub repository secrets store three values: `AZURE_CLIENT_ID` (the app registration’s Application ID), `AZURE_TENANT_ID` (your Entra ID tenant), and `AZURE_SUBSCRIPTION_ID`. These are identifiers, not credentials. If someone exfiltrates them, they get three GUIDs that are useless without a valid OIDC token from your specific GitHub repository, branch, and environment.

The pipeline

Here’s the structure of the workflow file (`.github/workflows/terraform.yml`):

“`yaml

on:

pull_request:

branches: [main]

paths: [‘**.tf’]

push:

branches: [main]

paths: [‘**.tf’]

permissions:

id-token: write # Required for OIDC

contents: read

pull-requests: write # Post plan output to PR

concurrency:

group: terraform-${{ github.ref }}

cancel-in-progress: false

jobs:

plan:

runs-on: ubuntu-latest

steps:

– uses: actions/checkout@v4

– uses: hashicorp/setup-terraform@v3

with:

terraform_version: ‘1.8.0’

– uses: azure/login@v2

with:

client-id: ${{ secrets.AZURE_CLIENT_ID }}

tenant-id: ${{ secrets.AZURE_TENANT_ID }}

subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

– name: Terraform Init

run: terraform init

– name: Terraform Format Check

run: terraform fmt -check -recursive

– name: Terraform Validate

run: terraform validate

– name: Terraform Plan

run: terraform plan -out=tfplan -no-color

# Post plan summary to PR as a comment

– name: Comment Plan on PR

if: github.event_name == ‘pull_request’

uses: actions/github-script@v7

with:

script: |

// Post truncated plan output to PR comment

apply:

needs: plan

if: github.ref == ‘refs/heads/main’ && github.event_name == ‘push’

runs-on: ubuntu-latest

environment: production # Requires approval

steps:

– uses: actions/checkout@v4

– uses: hashicorp/setup-terraform@v3

– uses: azure/login@v2

with:

client-id: ${{ secrets.AZURE_CLIENT_ID }}

tenant-id: ${{ secrets.AZURE_TENANT_ID }}

subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

– run: terraform init

– run: terraform apply -auto-approve

“`

A few design decisions worth calling out.

The `concurrency` block prevents two Terraform runs from executing simultaneously on the same branch. Terraform state is not designed for concurrent writes parallel runs will corrupt it. The `cancel-in-progress: false` setting means new runs queue instead of canceling in-flight ones, because you never want to interrupt a `terraform apply` mid-execution.

The `paths` filter ensures the pipeline only triggers when `.tf` files change. A README update shouldn’t trigger an infrastructure deployment.

The `environment: production` on the apply job means someone has to click “Approve” in the GitHub UI before `terraform apply` runs. This is your human gate the pipeline does the mechanical work (format, validate, plan), and a human makes the final call on whether the plan looks right.

This is the workload identity federation model in action the only values in your GitHub secrets are non-sensitive identifiers. The actual authentication happens at runtime through the OIDC token exchange described above, and every token is short-lived and scoped.

What your repo looks like after Part 3

“`

├── .github/

│ ├── copilot-instructions.md # Project-wide AI context

│ ├── agents/

│ │ ├── terraform-aca-implement.agent.md # Implementation specialist

│ │ └── terraform-aca-planning.agent.md # Planning specialist

│ ├── instructions/

│ │ ├── networking.instructions.md # Path-specific: vnet, nsg, dns

│ │ └── containers.instructions.md # Path-specific: aca, acr

│ ├── prompts/

│ │ ├── readme.prompt.md # Generate README (from Part 1)

│ │ ├── new-container-app.prompt.md # Scaffold new Container App

│ │ └── terraform-review.prompt.md # Security review checklist

│ └── workflows/

│ └── terraform.yml # CI/CD pipeline

├── aca.tf # Frontend + Backend Container Apps (with identity blocks)

├── acr.tf # Container Registry

├── appg.tf # Application Gateway

├── dns.tf # Private DNS

├── main.tf # Outputs, Log Analytics, role assignments

├── nsg.tf # Network Security Groups

├── provider.tf # Provider config

├── rg.tf # Resource Group

└── vnet.tf # Virtual Network

“`

The security posture, end to end

Let’s step back and look at what three parts of Terraform and a few config files got us.

The networking layer is locked down. Container Apps sit behind an internal load balancer in a delegated subnet. The only public entry is through the Application Gateway, which terminates HTTP and forwards to the frontend’s internal FQDN via private DNS.

The application layer follows Zero Trust. The backend is platform-isolated `external_enabled = false` means no load balancer rule exists, so there’s no path to misconfigure. The frontend reaches the backend via its internal FQDN, which resolves only within the Container App Environment.

The identity layer has zero static credentials. ACR admin is disabled. Container Apps authenticate to ACR using system-assigned Managed Identities with the `AcrPull` role. GitHub Actions authenticates to Azure using workload identity federation via OIDC no service principal secrets, just short-lived tokens exchanged at runtime through Microsoft Entra ID.

The human layer is AI-augmented. Copilot custom instructions encode your project’s security rules, so AI-generated code follows them by default. Custom agents specialize Copilot into focused personas a planning agent that researches and designs, an implementation agent that writes and validates code, each with scoped tools and guardrails. Prompt files automate code reviews that catch violations before they reach a PR. The CI/CD pipeline runs format, validate, and plan on every pull request, with mandatory approval before apply.

No admin passwords. No static credentials. No manual deployments. No AI-generated code that doesn’t understand your architecture.

That’s the end state of Part 3, and the end of this series.

What to explore next

This project is a solid foundation, but production environments always have more knobs to turn. A few ideas worth exploring: remote state with Azure Storage and state locking, Azure Key Vault integration for application secrets (database connection strings, API keys), custom domains with managed TLS certificates on the frontend, Dapr integration for service-to-service communication (sidecars, pub/sub, state management), and monitoring with Azure Application Insights wired through the existing Log Analytics workspace.

If you’ve followed along through all three parts, you have a microservices platform that’s network-isolated, identity-secured, AI-assisted, and pipeline-deployed. That’s not a tutorial that’s a production foundation.

—

*This is Part 3 of a 3-part series (a few more series may be added on the future) on building production-ready microservices on Azure Container Apps with Terraform. [Part 1](./README.md) covers the secure networking baseline. [Part 2](./blog-part-2-microservices.md) covers ACR, internal backend, and frontend-backend wiring.*