Implementing MCP OAuth 2.1 with Keycloak on AWS

Overview

Introduction

The Model Context Protocol (MCP) ecosystem mandates OAuth 2.1-compliant authorization servers to facilitate secure, federated access to AI model services. MCP clients, such as Claude Code, Cursor, and VS Code extensions, rely on modern OAuth specifications including Dynamic Client Registration (RFC 7591), PKCE (RFC 7636), and crucially, Resource Indicators (RFC 8707) for audience-restricted tokens.

However, most Identity-as-a-Service (IDaaS) providers, including the open-source Keycloak platform, currently lack full RFC 8707 support. Keycloak, while robust in OAuth 2.0 capabilities, employs a proprietary audience parameter in contrast to the standardized resource parameter defined in RFC 8707. For a comprehensive analysis of this compatibility landscape, refer to my previous post: Technical Deconstruction of MCP Authorization: A Deep Dive into OAuth 2.1 and IETF RFC Specifications.

This article provides a detailed guide on configuring Keycloak as an MCP-compatible authorization server through strategic use of protocol mappers and realm configuration. The implemented solution encompasses:

  • RFC 8707 Workaround: Custom audience protocol mappers to inject correct aud claims into JWT tokens.
  • Dynamic Client Registration: Automated client onboarding via realm default scopes.
  • Zero-Configuration MCP Support: Automatic audience restriction without manual client configuration.
  • Infrastructure Automation: Terraform deployment on AWS utilizing ECS Fargate and Aurora PostgreSQL.

Upon completion of this guide, you will possess a clear understanding of how to configure Keycloak for seamless MCP client support, enabling dynamic client registration with automated audience restriction.

Architecture Overview

This deployment leverages AWS managed services to establish a scalable Keycloak infrastructure tailored for MCP OAuth workflows.

Core Components

Compute Layer (ECS Fargate)

Keycloak operates as containerized workloads on AWS Fargate, offering managed compute capacity:

  • Custom Docker Image: Built from the official Keycloak 26.4.4 release, pre-configured with JDBC_PING for clustering.
  • Multi-AZ Deployment: Tasks are strategically distributed across multiple Availability Zones for resilience.
  • Health Monitoring: Integrated with AWS CloudWatch Container Insights for robust performance and health visibility.

Database Layer (Aurora PostgreSQL)

Amazon Aurora provides a highly available, scalable PostgreSQL-compatible database backend:

  • Database Engine: PostgreSQL 16 (Keycloak 26.4.4 requires PostgreSQL 13+ minimum, 16.8 recommended).
  • Scalability: Aurora Serverless v2, featuring configurable capacity and auto-scaling.
  • High Availability: Multi-AZ deployment with automatic failover mechanisms.
  • Security: Data encryption at rest and automated backup procedures.

Load Balancing (Application Load Balancer)

The Application Load Balancer (ALB) manages TLS termination and intelligent traffic distribution:

  • HTTPS/TLS: Certificate management handled by AWS Certificate Manager (ACM).
  • Health Checks: Continuously monitors Keycloak health endpoints to ensure service availability.
  • Session Affinity: Supports sticky sessions for maintaining stateful client connections.

Networking Infrastructure

A Virtual Private Cloud (VPC) provides a logically isolated network environment:

  • Subnets: Public and private subnets distributed across multiple Availability Zones.
  • NAT Gateways: Enable secure outbound internet access for resources within private subnets.
  • VPC Endpoints: Facilitate private connectivity to select AWS services.
  • Security Groups: Enforce granular network access controls.

Deployment Workflow

The infrastructure deployment adheres to a phased approach:

flowchart TD
    A[Create VPC & Networking] --> B[Deploy Aurora RDS]
    B --> C[Create ECS Cluster]
    C --> D[Build & Push Container Image]
    D --> E[Start ECS Tasks]
    E --> F[Configure MCP OAuth Realm]
    F --> G[Verify Dynamic Client Registration]

This structured methodology ensures that foundational infrastructure is provisioned prior to implementing MCP-specific Keycloak configurations.

Understanding Keycloak's RFC 8707 Gap

To comprehend the necessity of custom configuration in this deployment, we must analyze the incompatibility between Keycloak's audience implementation and the MCP specification's requirements.

The RFC 8707 Standard

RFC 8707 (Resource Indicators for OAuth 2.0) specifies a standardized mechanism for audience restriction within OAuth access tokens. This specification introduces a resource parameter, which clients include in both authorization and token requests:

1POST /token HTTP/1.1
2Host: auth.example.com
3Content-Type: application/x-www-form-urlencoded
4
5grant_type=authorization_code
6&code=ABC123
7&redirect_uri=https://client.example.com/callback
8&resource=https://api.example.com    ← Target audience
9&client_id=CLIENT_ID

The Authorization Server (AS) utilizes this resource parameter to populate the JWT's aud (audience) claim, thereby ensuring the token's validity is restricted to the specified Resource Server (RS).

Keycloak's Proprietary Approach

Keycloak's audience functionality was implemented prior to the publication of RFC 8707 in February 2020. As detailed in the MCP authorization compatibility matrix, Keycloak employs a proprietary audience parameter that predates the standardized approach.

The Problem: MCP clients (e.g., Claude Code, Cursor, VS Code extensions) adhere to RFC 8707 and transmit the resource parameter. Keycloak, however, disregards this parameter, resulting in JWT tokens that either lack the mandatory aud claim or contain incorrect audience values.

The Consequence: MCP servers validate the aud claim to mitigate token replay attacks, addressing the "Confused Deputy" problem. Without proper audience restriction, tokens risk rejection or potential misuse across disparate resource servers.

The Workaround Architecture

The proposed solution strategically leverages Keycloak's Protocol Mappers to automatically inject the correct aud claim, circumventing the absence of native RFC 8707 support. This architecture integrates three key components:

flowchart LR
    A[Dynamic Client Registration] --> B{Realm Default Scopes}
    B --> C[Auto-assign mcp:run scope]
    C --> D[mcp:run has Audience Mapper]
    D --> E[Token Request]
    E --> F{Mapper Active?}
    F -->|Yes| G[Inject aud claim]
    G --> H[JWT with correct audience]
    H --> I[MCP Server validates aud]
    I --> J[Access Granted]

    style D fill:#f59e0b
    style G fill:#10b981
    style I fill:#3b82f6

Component 1: Audience Protocol Mapper

A hardcoded claim mapper, associated with the mcp:run client scope, injects the MCP server's URL into the aud claim:

 1resource "keycloak_openid_hardcoded_claim_protocol_mapper" "mcp_run_audience_mapper" {
 2  realm_id        = keycloak_realm.mcp.id
 3  client_scope_id = keycloak_openid_client_scope.mcp_run.id
 4  name            = "mcp-audience"
 5
 6  claim_name       = "aud"
 7  claim_value      = var.resource_server_uri  # e.g., "https://mcp-server.example.com/mcp"
 8  claim_value_type = "String"
 9
10  add_to_id_token     = false
11  add_to_access_token = true   # ← Critical: Only in access tokens
12  add_to_userinfo     = false
13}

Component 2: Realm Default Scopes

By configuring mcp:run as a realm-wide default scope, all clients, including those registered via Dynamic Client Registration, automatically inherit this audience mapper:

 1resource "keycloak_realm_default_client_scopes" "mcp_realm_defaults" {
 2  realm_id = keycloak_realm.mcp.id
 3
 4  default_scopes = [
 5    "profile",
 6    "email",
 7    "mcp:run",    # ← Critical: Auto-assigned to DCR clients
 8    "roles",
 9    "web-origins",
10    "acr",
11    "basic",
12  ]
13}

Component 3: DCR Allowed Scopes Configuration

Client Registration Policies are configured to permit mcp:run within the allowed scopes for dynamically registered clients. This step is performed using the Keycloak Admin REST API due to current Terraform provider limitations:

 1# Extract Client Registration Policy component ID
 2COMPONENT_ID=$(curl -s "${KEYCLOAK_URL}/admin/realms/mcp/components" \
 3  -H "Authorization: Bearer ${ADMIN_TOKEN}" | \
 4  jq -r '.[] | select(.name=="Allowed Client Scopes") | .id')
 5
 6# Update allowed scopes to include mcp:run
 7curl -X PUT "${KEYCLOAK_URL}/admin/realms/mcp/components/${COMPONENT_ID}" \
 8  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
 9  -H "Content-Type: application/json" \
10  -d '{
11    "config": {
12      "allow-default-scopes": ["true"],
13      "allowed-client-scopes": ["openid", "profile", "email", "mcp:run"]
14    }
15  }'

Complete Flow of Operations

When an MCP client (e.g., Claude Code) attempts to access a protected MCP server:

  1. Discovery: The client retrieves the MCP server's metadata (RFC 9728) to identify the required Authorization Server.
  2. Registration: The client dynamically registers with Keycloak via a POST request to /clients-registrations/openid-connect.
  3. Automatic Scope Inheritance: Keycloak automatically assigns the mcp:run scope (due to realm default configuration) to the newly registered client.
  4. Authorization Flow: The client initiates the OAuth Authorization Code flow, incorporating PKCE.
  5. Token Issuance: Keycloak generates a JWT access token, and the audience mapper injects the aud: "https://mcp-server.example.com/mcp" claim.
  6. Validation: The MCP server validates the aud claim against its own identifier and grants access to the MCP resources.

Result: The MCP client achieves full functionality without requiring any manual configuration within the Keycloak administrative console. This pattern of realm default scopes combined with an audience mapper establishes fully automated MCP compatibility.

MCP OAuth 2.1 Configuration Deep Dive

This section details the Terraform configurations that transform a standard Keycloak deployment into an MCP-compliant authorization server.

RFC Compliance Matrix

The implementation ensures OAuth 2.1 compatibility through selective RFC adoption:

RFCSpecificationImplementation StatusNotes
RFC 7591Dynamic Client RegistrationCompleteAnonymous DCR enabled for zero-configuration clients
RFC 7636PKCE (Proof Key for Code Exchange)CompleteS256 challenge method mandatory for all clients
RFC 8414Authorization Server MetadataCompleteOIDC discovery at /.well-known/openid-configuration
RFC 8707Resource IndicatorsCompleteVia audience mapper workaround (native support in development)
RFC 9728Protected Resource Metadata⚠️ MCP Server-dependentImplemented by MCP servers, not the AS

Realm Configuration

The MCP realm (mcp-realm.tf) is meticulously configured to establish security policies and token lifespans, optimized for AI model access patterns:

 1resource "keycloak_realm" "mcp" {
 2  realm   = "mcp"
 3  enabled = true
 4
 5  display_name      = "MCP Authorization Server"
 6  display_name_html = "<b>Model Context Protocol</b>"
 7
 8  # Token lifespans - optimized for MCP sessions
 9  access_token_lifespan           = "1h"      # Longer for AI workflows
10  sso_session_idle_timeout        = "30m"
11  sso_session_max_lifespan        = "10h"
12  offline_session_idle_timeout    = "720h"    # 30 days
13
14  # Security policies
15  ssl_required = "external"  # Require HTTPS for external connections
16
17  password_policy = "length(12) and upperCase(1) and lowerCase(1) and digits(1) and specialChars(1)"
18
19  security_defenses {
20    headers {
21      x_frame_options                 = "DENY"
22      content_security_policy         = "frame-src 'self'; frame-ancestors 'self'; object-src 'none';"
23      content_security_policy_report_only = ""
24      x_content_type_options          = "nosniff"
25      x_robots_tag                    = "none"
26      x_xss_protection                = "1; mode=block"
27      strict_transport_security       = "max-age=31536000; includeSubDomains"
28    }
29
30    brute_force_detection {
31      permanent_lockout              = false
32      max_login_failures             = 5
33      wait_increment_seconds         = 60
34      quick_login_check_milli_seconds = 1000
35      minimum_quick_login_wait_seconds = 60
36      max_failure_wait_seconds       = 900
37      failure_reset_time_seconds     = 900
38    }
39  }
40}

Client Scopes and Audience Mapper

The mcp:run client scope (mcp-scopes.tf) forms the core of the workaround, intelligently combining scope definition with the critical audience mapper:

 1# Define the mcp:run client scope
 2resource "keycloak_openid_client_scope" "mcp_run" {
 3  realm_id               = keycloak_realm.mcp.id
 4  name                   = "mcp:run"
 5  description            = "Scope for MCP model execution with audience restriction"
 6  consent_screen_text    = "Access MCP model servers"
 7  include_in_token_scope = true
 8}
 9
10# Attach the audience mapper to mcp:run scope
11resource "keycloak_openid_hardcoded_claim_protocol_mapper" "mcp_run_audience_mapper" {
12  realm_id        = keycloak_realm.mcp.id
13  client_scope_id = keycloak_openid_client_scope.mcp_run.id
14  name            = "mcp-audience"
15
16  claim_name       = "aud"
17  claim_value      = var.resource_server_uri
18  claim_value_type = "String"
19
20  add_to_id_token     = false
21  add_to_access_token = true
22  add_to_userinfo     = false
23}
24
25# Make mcp:run a default scope for all clients
26resource "keycloak_realm_default_client_scopes" "mcp_realm_defaults" {
27  realm_id = keycloak_realm.mcp.id
28
29  default_scopes = [
30    "profile",
31    "email",
32    keycloak_openid_client_scope.mcp_run.name,  # ← Critical
33    "roles",
34    "web-origins",
35    "acr",
36    "basic",
37  ]
38}

Key Design Decision: The mapper specifically configures add_to_access_token = true and add_to_id_token = false. This intentional design ensures the aud claim is present in the access token (for resource server validation) but excluded from the ID token (consumed by the client for user information).

Two-Phase Deployment Pattern

The Keycloak Terraform Provider currently exhibits a limitation: it cannot directly manage Client Registration Policies, which govern DCR behavior. This necessitates a hybrid deployment approach:

Phase 1: Terraform Resources (terraform apply)

This phase declaratively provisions the infrastructure:

  • Realm with defined security policies
  • Client scopes with embedded protocol mappers
  • Realm default scopes
  • Optional example clients

Phase 2: REST API Configuration (Bash scripts)

This phase configures imperative settings using the Keycloak Admin REST API:

  1. fix-allowed-scopes.sh: Modifies the Client Registration Policy to include mcp:run in the allowed scopes list.
  2. disable-trusted-hosts.sh: Removes the Trusted Hosts policy to accommodate custom redirect URI schemes (e.g., cursor://, vscode://, claude://).
  3. enable-dcr.sh: Verifies Dynamic Client Registration functionality and confirms proper scope inheritance.

The integrated deploy.sh orchestrator automates the execution of both phases:

 1#!/bin/bash
 2set -e
 3
 4echo "Phase 1: Terraform deployment..."
 5terraform init
 6terraform apply -auto-approve
 7
 8echo "Phase 2: REST API configuration..."
 9./fix-allowed-scopes.sh
10./disable-trusted-hosts.sh
11
12echo "Verification: Testing DCR..."
13./enable-dcr.sh
14
15echo "Deployment complete! MCP OAuth 2.1 realm ready."

Trusted Hosts Policy Removal

MCP clients frequently employ non-standard redirect URI schemes that Keycloak's default policies typically reject. The solution involves completely removing the Trusted Hosts policy component:

1# Find the Trusted Hosts policy component
2TRUSTED_HOSTS_ID=$(curl -s "${KEYCLOAK_URL}/admin/realms/mcp/components" \
3  -H "Authorization: Bearer ${ADMIN_TOKEN}" | \
4  jq -r '.[] | select(.name=="Trusted Hosts") | .id')
5
6# Delete it entirely
7curl -X DELETE "${KEYCLOAK_URL}/admin/realms/mcp/components/${TRUSTED_HOSTS_ID}" \
8  -H "Authorization: Bearer ${ADMIN_TOKEN}"

Security Consideration: This action allows all redirect URI schemes, including http://localhost:* for development purposes. For production deployments safeguarding sensitive data, it is recommended to implement a custom policy that explicitly whitelists only approved schemes such as https://, cursor://, vscode://, and claude://.

Infrastructure Components

Beyond the OAuth configuration, several AWS infrastructure components provide essential support for the deployment.

JDBC_PING Clustering for ECS Fargate

Keycloak's native clustering mechanism, JGroups, typically relies on UDP multicast for node discovery. However, AWS VPCs do not support multicast, and ECS Fargate instances lack static IP addresses. The adopted solution is JDBC_PING, which utilizes the PostgreSQL database as a robust coordination mechanism.

How JDBC_PING Functions:

  1. Each Keycloak container registers its IP address and port within the JGROUPSPING table in PostgreSQL.
  2. Containers periodically query this table to discover active cluster members.
  3. Session data is replicated across the discovered cluster members.
  4. Upon container termination, its corresponding entry is gracefully removed from the table.

Configuration (cache-ispn-jdbc-ping.xml):

 1<config xmlns="urn:org:jgroups"
 2        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 3        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.2.xsd">
 4    <TCP bind_addr="${jgroups.bind.address,jgroups.tcp.address:SITE_LOCAL}"
 5         bind_port="${jgroups.bind.port,jgroups.tcp.port:7800}"
 6         recv_buf_size="5m"
 7         send_buf_size="1m"
 8         max_bundle_size="64k"/>
 9
10    <JDBC_PING
11        connection_driver="org.postgresql.Driver"
12        connection_url="${env.KC_DB_URL}"
13        connection_username="${env.KC_DB_USERNAME}"
14        connection_password="${env.KC_DB_PASSWORD}"
15        initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (
16            own_addr VARCHAR(200) NOT NULL,
17            cluster_name VARCHAR(200) NOT NULL,
18            ping_data BYTEA,
19            constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name)
20        );"
21        info_writer_sleep_time="500"
22        remove_all_data_on_view_change="true"
23        stack.combine="REPLACE"
24        stack.position="MPING"/>
25
26    <MERGE3 min_interval="10000" max_interval="30000"/>
27    <FD_SOCK/>
28    <FD_ALL timeout="60000" interval="15000"/>
29    <VERIFY_SUSPECT timeout="5000"/>
30    <pbcast.NAKACK2 use_mcast_xmit="false" xmit_interval="1000"/>
31    <UNICAST3 xmit_interval="500"/>
32    <pbcast.STABLE desired_avg_gossip="50000" max_bytes="8m"/>
33    <pbcast.GMS print_local_addr="true" join_timeout="2000"/>
34    <UFC max_credits="2m" min_threshold="0.4"/>
35    <MFC max_credits="2m" min_threshold="0.4"/>
36    <FRAG2 frag_size="60k"/>
37</config>

Keycloak Container Configuration:

The Dockerfile integrates this configuration during the build process:

 1FROM quay.io/keycloak/keycloak:26.4.4 as builder
 2
 3ENV KC_HEALTH_ENABLED=true
 4ENV KC_METRICS_ENABLED=true
 5ENV KC_HTTP_RELATIVE_PATH=/auth
 6ENV KC_DB=postgres
 7
 8# Copy JDBC_PING configuration
 9COPY ./cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml
10
11# Build optimized image
12RUN /opt/keycloak/bin/kc.sh build --cache-config-file=cache-ispn-jdbc-ping.xml
13
14FROM quay.io/keycloak/keycloak:26.4.4
15COPY --from=builder /opt/keycloak /opt/keycloak
16
17EXPOSE 7800  # JDBC_PING coordination port
18ENTRYPOINT ["/opt/keycloak/bin/kc.sh"]

ECS Task Definition:

The ECS task definition exposes port 7800 to facilitate cluster communication:

 1{
 2  "containerDefinitions": [{
 3    "name": "keycloak",
 4    "image": "${ecr_repository_url}:${image_tag}",
 5    "portMappings": [
 6      {
 7        "containerPort": 8080,
 8        "protocol": "tcp"
 9      },
10      {
11        "containerPort": 7800,
12        "protocol": "tcp"
13      }
14    ],
15    "environment": [
16      {"name": "KC_DB", "value": "postgres"},
17      {"name": "KC_DB_URL", "value": "jdbc:postgresql://..."},
18      {"name": "KC_PROXY_HEADERS", "value": "xforwarded"},
19      {"name": "KC_CACHE_CONFIG_FILE", "value": "cache-ispn-jdbc-ping.xml"}
20    ],
21    "secrets": [
22      {"name": "KC_DB_PASSWORD", "valueFrom": "arn:aws:ssm:..."},
23      {"name": "KEYCLOAK_ADMIN_PASSWORD", "valueFrom": "arn:aws:ssm:..."}
24    ],
25    "healthCheck": {
26      "command": ["CMD-SHELL", "curl -f http://localhost:8080/auth/health || exit 1"],
27      "interval": 30,
28      "timeout": 5,
29      "retries": 3
30    }
31  }]
32}

Result: ECS can dynamically scale tasks up or down. New containers automatically join the cluster, and terminated containers are gracefully removed, ensuring session data persistence across container restarts.

Aurora Serverless v2 Configuration

Aurora Serverless v2 offers a PostgreSQL-compatible database with sub-second scaling and granular pay-per-second billing:

 1module "aurora_postgresql" {
 2  source  = "terraform-aws-modules/rds-aurora/aws"
 3  version = "~> 8.0"
 4
 5  name           = "keycloak-db"
 6  engine         = "aurora-postgresql"
 7  engine_version = "16.8"
 8  instance_class = "db.serverless"
 9  instances = {
10    one = {}
11    two = {}  # Multi-AZ for high availability
12  }
13
14  serverlessv2_scaling_configuration = {
15    min_capacity = 0.5  # 1 GB RAM - minimal idle cost
16    max_capacity = 2    # 4 GB RAM - handles production traffic
17  }
18
19  vpc_id               = module.vpc.vpc_id
20  db_subnet_group_name = aws_db_subnet_group.aurora.name
21  security_group_rules = {
22    keycloak_ingress = {
23      source_security_group_id = aws_security_group.keycloak_ecs.id
24    }
25  }
26
27  storage_encrypted = true
28  apply_immediately = true
29
30  backup_retention_period = 7
31  preferred_backup_window = "03:00-04:00"
32
33  database_name   = "keycloak"
34  master_username = "keycloak"
35}

Scaling Behavior: Aurora Serverless v2 actively monitors database load (CPU, connections, memory) and adjusts ACU capacity in sub-second increments. Typical Keycloak workload scaling ranges are:

  • Idle/Development: 0.5 ACU (approximately $0.12/hour)
  • Normal Production: 1-1.5 ACU (approximately $0.24/hour)
  • High Load (authentication storms): 1.5-2 ACU (approximately $0.36/hour)

ECS Service Configuration

The ECS service is responsible for managing task placement and continuous health monitoring:

 1resource "aws_ecs_service" "keycloak" {
 2  name            = "keycloak"
 3  cluster         = aws_ecs_cluster.main.id
 4  task_definition = aws_ecs_task_definition.keycloak.arn
 5  desired_count   = var.desired_count  # 2 for HA
 6
 7  launch_type = "FARGATE"
 8  platform_version = "LATEST"
 9
10  deployment_maximum_percent         = 200  # Allow 2x capacity during updates
11  deployment_minimum_healthy_percent = 100  # Always maintain full capacity
12  health_check_grace_period_seconds  = 600  # Allow time for Keycloak startup
13
14  network_configuration {
15    subnets         = module.vpc.private_subnets
16    security_groups = [aws_security_group.keycloak_ecs.id]
17  }
18
19  load_balancer {
20    target_group_arn = aws_lb_target_group.keycloak.arn
21    container_name   = "keycloak"
22    container_port   = 8080
23  }
24
25  depends_on = [aws_lb_listener.https]
26}

Deployment Strategy: This configuration ensures zero-downtime updates through the following sequence:

  1. New tasks are initiated, temporarily increasing capacity to 200%.
  2. New tasks successfully pass health checks (Keycloak startup is allotted 600 seconds).
  3. Traffic is progressively diverted to the new tasks.
  4. Old tasks are gracefully drained and terminated.
  5. The system returns to its stable state of 100% capacity (2 tasks).

Deployment Walkthrough

This section provides a comprehensive, step-by-step guide for deploying the Keycloak infrastructure.

Prerequisites

Local Tools:

  • Terraform >= 1.0
  • AWS CLI v2, configured with appropriate credentials
  • Docker (for building container images)
  • jq (for JSON parsing in scripts)
  • make (optional, for simplified command execution)

AWS Permissions:

  • Creation of VPC, Subnet, Security Group, and NAT Gateway resources.
  • Provisioning of RDS Aurora cluster and instances.
  • Management of ECS cluster, task definitions, and services.
  • Creation of ECR repositories and pushing container images.
  • Creation of IAM roles for ECS task execution.
  • Read/write access to SSM Parameter Store.
  • Creation of ACM certificates (or access to an existing certificate ARN).

Step 1: Clone and Initialize Infrastructure

 1# Clone the repository
 2git clone https://github.com/your-org/terraform-keycloak-aws.git
 3cd terraform-keycloak-aws
 4
 5# Create a new environment
 6cp -r environments/template environments/production
 7cd environments/production
 8
 9# Configure terraform.tfvars
10cat > terraform.tfvars <<EOF
11aws_region          = "us-east-1"
12environment         = "production"
13vpc_cidr            = "10.0.0.0/16"
14availability_zones  = ["us-east-1a", "us-east-1b"]
15
16# Start with 0 to avoid costs during initial setup
17desired_count       = 0
18
19# Use existing ACM certificate or create new one
20certificate_arn     = "arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT_ID"
21domain_name         = "auth.example.com"
22
23# Database configuration
24db_instance_class   = "db.serverless"
25db_allocated_storage = 20
26db_engine_version   = "16.8"
27
28# Aurora Serverless v2 scaling
29aurora_serverless_min_capacity = 0.5
30aurora_serverless_max_capacity = 2
31EOF
32
33# Initialize and create infrastructure (no running tasks yet)
34make all
35# Or manually:
36# terraform init
37# terraform plan
38# terraform apply

Result: The VPC, subnets, NAT gateways, Aurora RDS, ECS cluster, and ALB are successfully provisioned. No ECS tasks are yet operational.

Step 2: Build and Push Container Image

 1cd ../../build/keycloak
 2
 3# Configure environment
 4export AWS_REGION=us-east-1
 5export ENV_NAME=production
 6
 7# Build and push (uses Makefile automation)
 8make all
 9
10# Or manually:
11# aws ecr get-login-password --region us-east-1 | \
12#   docker login --username AWS --password-stdin $(aws sts get-caller-identity --query Account --output text).dkr.ecr.us-east-1.amazonaws.com
13# docker build -t keycloak-mcp:latest .
14# docker tag keycloak-mcp:latest ECR_URL:latest
15# docker push ECR_URL:latest

Result: A custom Keycloak container image, configured with JDBC_PING clustering, is built and pushed to Amazon ECR.

Step 3: Scale Up ECS Service

1cd ../../environments/production
2
3# Update terraform.tfvars
4sed -i 's/desired_count = 0/desired_count = 2/' terraform.tfvars
5
6# Apply changes
7make update
8# Or: terraform apply

Result: Two Keycloak containers are launched within private subnets, establish a cluster via JDBC_PING, and register with the ALB. Keycloak becomes accessible at https://auth.example.com/auth.

Step 4: Create Admin User

 1# Get admin password from SSM Parameter Store
 2ADMIN_PASSWORD=$(aws ssm get-parameter \
 3  --name "/keycloak/production/admin_password" \
 4  --with-decryption \
 5  --query Parameter.Value \
 6  --output text)
 7
 8echo "Admin URL: https://auth.example.com/auth/admin"
 9echo "Username: admin"
10echo "Password: ${ADMIN_PASSWORD}"

Log in to the Keycloak admin console to verify the deployment's integrity.

Step 5: Configure MCP OAuth Realm

 1cd mcp-oauth
 2
 3# Auto-generate configuration from parent deployment
 4./init-from-parent.sh --mcp-server-url "https://mcp-server.example.com/mcp"
 5
 6# Review generated terraform.tfvars
 7cat terraform.tfvars
 8
 9# Deploy MCP OAuth realm (Terraform + REST API)
10make deploy
11
12# Verify Dynamic Client Registration
13./enable-dcr.sh

Result: The MCP realm is successfully created and configured with:

  • An mcp:run client scope, incorporating the audience mapper.
  • Properly configured realm default scopes.
  • Enabled and verified Dynamic Client Registration.
  • Removal of the Trusted Hosts policy.

Step 6: Test with MCP Client

Configure an MCP client (e.g., Claude Code, Cursor, VS Code) to establish a connection:

MCP Server Configuration Example:

 1{
 2  "servers": {
 3    "my-mcp-server": {
 4      "url": "https://mcp-server.example.com/mcp",
 5      "auth": {
 6        "type": "oauth",
 7        "authorizationUrl": "https://auth.example.com/auth/realms/mcp/protocol/openid-connect/auth",
 8        "tokenUrl": "https://auth.example.com/auth/realms/mcp/protocol/openid-connect/token"
 9      }
10    }
11  }
12}

Expected Flow:

  1. The MCP client discovers Authorization Server metadata from Keycloak's OIDC discovery endpoint.
  2. The client dynamically registers via DCR, obtaining a client_id.
  3. The client initiates the Authorization Code flow with PKCE.
  4. The user authenticates through Keycloak.
  5. The client receives a JWT access token containing aud: "https://mcp-server.example.com/mcp".
  6. The MCP server validates the token's aud claim and grants access to MCP resources.

Deployment Considerations

High Availability Configuration

The deployment is engineered for high availability, incorporating redundancy across multiple layers:

Multi-AZ Distribution:

  • ECS Tasks: Distributed across two or more Availability Zones using ECS placement strategies.
  • Aurora: Multi-AZ cluster with automated failover, achieving a Recovery Time Objective (RTO) typically under 60 seconds.
  • ALB: Employs cross-zone load balancing by default.
  • NAT Gateways: One per AZ (total of two) for independent outbound connectivity.

Failure Scenarios:

  • Single ECS Task Failure: The ALB reroutes traffic to healthy tasks, and ECS automatically initiates a replacement.
  • Availability Zone Failure: Aurora promotes a replica, and ECS tasks in other AZs continue to serve traffic.
  • Database Primary Failure: Aurora automatically fails over to a replica in a different AZ.

Zero-Downtime Deployments:

The deployment_minimum_healthy_percent = 100 configuration ensures continuous full capacity during service updates:

1Initial state:     [Task1] [Task2]         (100% capacity)
2Update triggered:  [Task1] [Task2] [Task3] [Task4]  (200% capacity)
3Health checks:     [Task1] [Task2] [Task3✓] [Task4✓]
4Drain old:         [Task3✓] [Task4✓]
5Final state:       [Task3] [Task4]         (100% capacity)

Monitoring and Observability

CloudWatch Logs:

All container output is streamed to CloudWatch Logs with configurable retention policies:

1resource "aws_cloudwatch_log_group" "keycloak" {
2  name              = "/ecs/keycloak"
3  retention_in_days = 30
4}

Key Logs to Monitor:

  • Authentication failures: Search for WARN.*org.keycloak.events.
  • Database errors: Search for ERROR.*Hibernate.
  • Clustering issues: Search for WARN.*JGroups or JDBC_PING.

Container Insights:

ECS Container Insights should be enabled for comprehensive cluster-level metrics:

1resource "aws_ecs_cluster" "main" {
2  name = "keycloak-cluster"
3
4  setting {
5    name  = "containerInsights"
6    value = "enabled"
7  }
8}

This provides metrics for:

  • CPU and memory utilization per task.
  • Network ingress/egress.
  • Task startup and health check durations.

Keycloak Built-in Endpoints:

  • Health: https://auth.example.com/auth/health → Returns {"status": "UP"}.
  • Metrics (Prometheus): https://auth.example.com/auth/metrics → Offers detailed application metrics.
  • Server Info: Accessible via Admin Console → Server Info → provides version, clustering status, and memory usage.

Recommended Alarms:

 1resource "aws_cloudwatch_metric_alarm" "ecs_cpu_high" {
 2  alarm_name          = "keycloak-cpu-high"
 3  comparison_operator = "GreaterThanThreshold"
 4  evaluation_periods  = "2"
 5  metric_name         = "CPUUtilization"
 6  namespace           = "AWS/ECS"
 7  period              = "300"
 8  statistic           = "Average"
 9  threshold           = "80"
10  alarm_description   = "ECS CPU utilization is too high"
11  alarm_actions       = [aws_sns_topic.alerts.arn]
12
13  dimensions = {
14    ClusterName = aws_ecs_cluster.main.name
15    ServiceName = aws_ecs_service.keycloak.name
16  }
17}
18
19resource "aws_cloudwatch_metric_alarm" "aurora_cpu_high" {
20  alarm_name          = "keycloak-db-cpu-high"
21  comparison_operator = "GreaterThanThreshold"
22  evaluation_periods  = "2"
23  metric_name         = "CPUUtilization"
24  namespace           = "AWS/RDS"
25  period              = "300"
26  statistic           = "Average"
27  threshold           = "80"
28  alarm_description   = "Aurora CPU utilization is too high"
29  alarm_actions       = [aws_sns_topic.alerts.arn]
30
31  dimensions = {
32    DBClusterIdentifier = module.aurora_postgresql.cluster_id
33  }
34}

Security Best Practices

Encryption Everywhere:

  • ALB: Enforces TLS 1.2+ with ACM certificates.
  • RDS: AES-256 encryption at rest, managed by KMS.
  • ECR: Encrypted container images.
  • Parameter Store: SecureString encryption for sensitive credentials.
  • In-transit: All external communication is secured via HTTPS.

Secrets Management:

Sensitive values are securely stored in AWS Systems Manager Parameter Store:

 1# Store admin password
 2aws ssm put-parameter \
 3  --name "/keycloak/production/admin_password" \
 4  --value "$(openssl rand -base64 32)" \
 5  --type SecureString
 6
 7# Store database password
 8aws ssm put-parameter \
 9  --name "/keycloak/production/db_password" \
10  --value "$(openssl rand -base64 32)" \
11  --type SecureString

ECS tasks retrieve secrets at runtime through IAM role permissions, eliminating hardcoded credentials in task definitions or source code.

Network Isolation:

  • ECS Tasks: Confined to private subnets (no direct internet access).
  • RDS: Located in private subnets, accessible only from the ECS security group.
  • ALB: Deployed in public subnets (internet-facing).
  • NAT Gateways: Reside in public subnets, providing outbound-only internet access for private subnets.

Security Group Rules:

 1# ALB security group - allow HTTPS from anywhere
 2resource "aws_security_group" "alb" {
 3  name_prefix = "keycloak-alb-"
 4  vpc_id      = module.vpc.vpc_id
 5
 6  ingress {
 7    from_port   = 443
 8    to_port     = 443
 9    protocol    = "tcp"
10    cidr_blocks = ["0.0.0.0/0"]
11  }
12
13  egress {
14    from_port   = 0
15    to_port     = 0
16    protocol    = "-1"
17    cidr_blocks = ["0.0.0.0/0"]
18  }
19}
20
21# ECS security group - allow traffic only from ALB
22resource "aws_security_group" "keycloak_ecs" {
23  name_prefix = "keycloak-ecs-"
24  vpc_id      = module.vpc.vpc_id
25
26  ingress {
27    from_port       = 8080
28    to_port         = 8080
29    protocol        = "tcp"
30    security_groups = [aws_security_group.alb.id]
31  }
32
33  # Allow clustering between ECS tasks
34  ingress {
35    from_port = 7800
36    to_port   = 7800
37    protocol  = "tcp"
38    self      = true
39  }
40
41  egress {
42    from_port   = 0
43    to_port     = 0
44    protocol    = "-1"
45    cidr_blocks = ["0.0.0.0/0"]
46  }
47}
48
49# RDS security group - allow traffic only from ECS
50resource "aws_security_group" "aurora" {
51  name_prefix = "keycloak-aurora-"
52  vpc_id      = module.vpc.vpc_id
53
54  ingress {
55    from_port       = 5432
56    to_port         = 5432
57    protocol        = "tcp"
58    security_groups = [aws_security_group.keycloak_ecs.id]
59  }
60}

Troubleshooting Common Issues

Issue 1: DCR Clients Missing mcp:run Scope

Symptoms: Dynamically registered clients receive tokens without the aud claim, or with aud: [] (an empty array).

Root Cause: The mcp:run scope is not configured as a realm default scope, or the Client Registration Policy does not permit it.

Solution:

 1# Verify realm default scopes
 2curl -s "${KEYCLOAK_URL}/admin/realms/mcp" \
 3  -H "Authorization: Bearer ${ADMIN_TOKEN}" | \
 4  jq '.defaultDefaultClientScopes'
 5
 6# Expected output should include "mcp:run".
 7# If missing, recreate the realm or manually add via the admin console:
 8# Realm Settings → Client Scopes → Default Client Scopes → Add "mcp:run".
 9
10# Verify Client Registration Policy allows mcp:run
11cd environments/production/mcp-oauth
12./fix-allowed-scopes.sh

Issue 2: Trusted Hosts Policy Blocking Custom Redirect URIs

Symptoms: Dynamic Client Registration succeeds, but authorization requests fail with an "Invalid redirect URI" error for schemes like cursor:// or vscode://.

Root Cause: Keycloak's Trusted Hosts policy defaults to rejecting non-HTTPS schemes.

Solution:

1cd environments/production/mcp-oauth
2./disable-trusted-hosts.sh

Verify within the admin console: Client Registration → Policies → (confirm "Trusted Hosts" policy is absent).

Issue 3: Clustering Failures ("Split Brain")

Symptoms: Users experience inconsistent authentication states, or sessions unexpectedly expire. Logs display repeated VIEW_CHANGE messages or WARN.*JDBC_PING errors.

Root Cause: JDBC_PING communication issues, typically due to:

  • Database connectivity problems.
  • Security group rules blocking port 7800.
  • Multiple tasks simultaneously attempting to write to the JGROUPSPING table.

Solution:

 1# Check JGROUPSPING table
 2psql -h AURORA_ENDPOINT -U keycloak -d keycloak -c "SELECT * FROM JGROUPSPING;"
 3
 4# The table should display one row per running ECS task.
 5# If empty or stale, investigate:
 6# 1. Ensure the security group permits port 7800 between ECS tasks.
 7# 2. Verify the `KC_DB_URL` environment variable is correct.
 8# 3. Confirm database credentials are valid.
 9
10# Restart the ECS service to force a cluster re-join
11aws ecs update-service \
12  --cluster keycloak-cluster \
13  --service keycloak \
14  --force-new-deployment

Issue 4: ALB Health Checks Failing

Symptoms: ECS tasks initiate, pass initial health checks, but then repeatedly fail and restart.

Root Cause: The health check path /auth/health may not respond promptly during Keycloak startup (which can take 60-120 seconds), or the health check interval is too aggressive.

Solution:

Increase the health check grace period:

1resource "aws_ecs_service" "keycloak" {
2  # ...
3  health_check_grace_period_seconds = 600  # 10 minutes
4}

Alternatively, utilize a more reliable health check path:

 1resource "aws_lb_target_group" "keycloak" {
 2  # ...
 3  health_check {
 4    enabled             = true
 5    path                = "/auth/realms/master"  # More reliable than /auth/health
 6    port                = "traffic-port"
 7    protocol            = "HTTP"
 8    timeout             = 5
 9    interval            = 30
10    healthy_threshold   = 2
11    unhealthy_threshold = 3
12    matcher             = "200"
13  }
14}

Conclusion

This guide has elucidated the process of configuring Keycloak as an MCP-compatible OAuth 2.1 authorization server. By strategically leveraging Keycloak's protocol mapper extensibility and realm configuration, this solution effectively addresses the platform's native lack of RFC 8707 support, while enabling a zero-configuration experience for MCP clients.

Key Implementation Takeaways:

  1. Audience Mapper Workaround: Custom protocol mappers are employed to inject the requisite aud claim into JWT access tokens, compensating for Keycloak's inherent RFC 8707 limitations.
  2. Realm Default Scopes: The configuration of mcp:run as a realm default scope ensures that all dynamically registered clients automatically inherit the audience mapper.
  3. Two-Phase Configuration: A hybrid approach, combining Terraform resources with REST API configuration, effectively navigates the current limitations of the Keycloak Terraform provider concerning Client Registration Policies.
  4. Automated Infrastructure: Terraform modules facilitate repeatable AWS deployments, incorporating ECS Fargate, Aurora PostgreSQL, and essential networking components.

When to Choose Keycloak for MCP:

  • Self-hosted Requirements: When on-premises or private cloud deployment is mandated.
  • Keycloak-Specific Features: When advanced features like user federation, identity brokering, or custom authentication flows are necessary.
  • Cost Considerations: For scenarios prioritizing open-source licensing with infrastructure-only costs.
  • Customization Needs: When full control over authentication flows and user management is essential.

Alternative Solutions for Consideration:

For organizations preferring managed identity solutions, IDaaS providers offering native RFC 8707 support should be evaluated:

  • Amazon Cognito: Provides native RFC 8707 support, though it requires custom implementation for the DCR endpoint.
  • Ping Identity: Offers comprehensive RFC 8707 and RFC 7591 compliance (as detailed in the MCP authorization compatibility analysis).

For a deeper technical understanding of the OAuth 2.1 specifications and RFC requirements underpinning MCP authorization, refer to my comprehensive analysis: Technical Deconstruction of MCP Authorization: A Deep Dive into OAuth 2.1 and IETF RFC Specifications.

The complete Terraform configuration, Dockerfile, and deployment automation scripts are available in the terraform-keycloak-aws repository.

Resources & References

Official Documentation

OAuth and MCP Specifications

GitHub Repository

  • terraform-keycloak-aws: Complete source code for this deployment, including Terraform modules, Dockerfile, and deployment automation scripts.